Replies: 7 comments 27 replies
-
This was one of two approaches that came to my mind as I was thinking about this problem. The other approach is called the typestate pattern. (This video is well worth watching when you have the time.) I think the enum variant + wrapped specialized struct will be a significant improvement over the current approach. The typestate pattern is a little advanced, and I worry that if we let the code get too far ahead of people's learning, it may make the project really hard for people to work on. Eventually we might get there. Anyway, I think your proposal is taking things in the right direction. |
Beta Was this translation helpful? Give feedback.
-
I also wonder whether there's a way to avoid exposing the full attribute API on the enum variant, which requires an internal Callers might have to do |
Beta Was this translation helpful? Give feedback.
-
I'm fine with making the node data a bit easier to handle. However, keep in mind that we still don't know if this node structure will be the actual output of the parser. When parsing a document, script functions can interrupt the parsing and mutate the dom-tree as it is as that point, before the parser continue with the parsing of the rest of the document. This would mean we need to generate a dom-tree DURING the parsing, and not convert a node-tree to a dom-tree after the parsing is completed. It might be worthwile to see if we can either convert the current node structure into a DOM system, and maybe even see if we can have something like spidermonkey or v8 (prefer spidermonkey) incorporated to deal with script tags. (i'm currently not worried about parallelism, as javascript is blocking the parser anyway, but we have to think about this sooner rather than later as well. |
Beta Was this translation helpful? Give feedback.
-
Yeah, it could be considered an IR at the moment. It could be abstracted later to make it easier on the UA to render (if necessary), but right now my thought was to make the current <html>
<head></head>
<body>
<p id="myid">Some text</p>
<script>
document.getElementById("myid").style.background = 'red';
</script>
<p id="otherid">More text</p>
<script>
document.getElementById("myid").style.background = 'blue';
const newP = document.createElement("p");
const newPText = document.createTextNode("even more text");
newP.appendChild(newPText);
const otherId = document.getElementById("otherid");
const parentNode = otherId.parentNode;
parentNode.insertBefore(newP, otherId);
</script>
<p>Final text</p>
</body>
</html> Then basically the steps are:
Obviously we're a long ways to javascript, but I at least wanted to start building the foundation for having a mutable tree for the parser to touch to have an accurate representation of the current source (essentially what you stated) |
Beta Was this translation helpful? Give feedback.
-
After a closing tag of a script, the parser is interrupted by the javascript code. At that point, javascript has access to the dom as it is currently (partially) generated. Once the javascript has finished, it may or may not have modified the dom, and the parser continues with the next token in the html stream. Note that the html stream never changes so it can safely continue. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/gosub-browser/gosub-engine/blob/main/src/html5_parser/parser.rs#L698 |
Beta Was this translation helpful? Give feedback.
-
Yeah, I still need to do some reading on JavaScript engines but my initial simple-minded thought is that when hitting the end script tag it's essentially converted into instructions that the parser can use to interact with the dom tree |
Beta Was this translation helpful? Give feedback.
-
After some discussion with @emwalker, we briefly talked about the idea of redesigning current structure of
Node.data
.Currently we are storing data directly in the
NodeData
enum like so:This gets a little messy when we have start adding methods that only apply to a particular type of node. For example, when I introduced all the attribute methods, we have these nasty checks in every method:
This will only get worse as we add more methods specific to different node types (text nodes, element, any others in the future.)
I did some brainstorming tonight and have a proposal:
We create different structs for each node type and wrap that in the enum:
and the construction will be changed to (for example on the Element type):
The dedicated structs will have their specific methods (this has the advantage of not polluting the
Node
struct as well):Then when it comes to actual usage (for example, fetching a node and adding an attribute; side note, I'm writing this by hand and not actually compiling so there are likely errors in below syntax)
In current state, it looks more like the following:
The current implementation of
insert_attribute
is:But with the proposed approach could be simplified to:
I think this would help remove bloat in the
Node
struct both now and in the future as well as significantly simplify the methods by removing the boilerplate type checks.This would require a bit of rework so I wanted to have an open discussion before I started any serious work on it. If we are good with this idea, I will open an issue based off this discussion and assign it to myself.
Beta Was this translation helpful? Give feedback.
All reactions