[CLOSED] Redesigning `Document` to be used as the primary DOM tree #63

Kiyoshika · 2023-09-30T20:00:58Z

Kiyoshika
Sep 30, 2023
Collaborator

I first started experimenting with reworking Node to be more inline with the DOM spec (4.5) (not a complete 1:1 implementation of the spec):

/// Node that resembles a DOM node
pub struct Node {
    /// ID of the node, 0 is always the root / document node
    pub id: usize,
    /// parent of the node, if any
    pub parent: Option<Rc<Node>>,
    /// children of the node 
    pub children: Vec<Rc<Node>>,
    /// first child node, if any
    pub first_child: Option<Rc<Node>>,
    /// last child node, if any
    pub last_child: Option<Rc<Node>>,
    /// next sibling node (next node with the same non-null parent)
    pub next_sibling: Option<Rc<Node>>,
    /// previous sibling node (previous node with the same non-null parent)
    pub previous_sibling: Option<Rc<Node>>,
    /// name of the node, or empty when it's not a tag
    pub name: String,
    /// namespace of the node
    pub namespace: Option<String>,
    /// type of the node (equivalent to type_of trait implemented)
    pub node_type: NodeType,
    /// actual data of the node
    pub data: NodeData,
    /// value of the node (this would probably be a light wrapper around self.data)
    pub value: String,
    /// text content of the node
    pub text_content: String
}

I've replaced the original IDs with reference counters to remove the dependency on NodeArena and we can directly access children/siblings from a given Node. I used reference counters since we could have multiple things pointing to a node (siblings, children, parents, etc.) Although keep in mind I'm extremely new to rust so if there's a better alternative to this, we can replace that.

Then I restructured Document to be the following:

pub struct Document {
    pub root_node: Option<Rc<Node>>,
    pub current_node: Option<Rc<Node>>,
    pub doctype: DocumentType,   // Document type
    pub quirks_mode: QuirksMode, // Quirks mode
}

When a Document is first created, a Node::new_document() node is created and assigned as the root and current_node points to it.

In theory, as we are parsing, we would have access to to document.current_node and can document.current_node.append_child whenever we are creating new nodes (or look at siblings, children, etc. as we are parsing.) The Node itself will have utility methods to check for children, check if the node allows certain children/text nodes, etc.

Of course, we'll have to update the current_node as we are parsing when we append children. I did something similar in one of my projects, where whenever we start a new <tag> we update the current_node to this current tag, and when the tag closes </tag> we update the current_node to point back to its parent

// current node: document
<html> // current node: html
  <div> // current node: div
    <p> // current node: p
      hello // current node: p
    </p> // current node: div (p's parent)
  </div> // current node: html (div's parent)
</html>

After parsing is finished, the same document is returned which would initially be pointing to the root document node, then can be traversed with something like (note: I'm writing this directly in github, there are definitely syntax errors, but you should get the idea)

// maybe this is something a user agent / renderer would do

let document: Document = parser.parse();

let root_node: Node = *document.root_node;

fn traverse(current_node: Node) {

    // do something with current_node (get text content, read attributes, whatever)

    for child_node in current_node.children.iter_into() {
        traverse(*child_node);
    }
}

traverse(root_node);

Obviously, implementing all these changes would require a significant rework on the parser, which I took a look at but I'm too new to rust to understand the macros used currently.

We can discuss this more if we all think this type of structure is worth it in the long run, and if so, we can create an experimental branch and people can help me rework the parser to accomadate the new node/document structure

jaytaph · 2023-10-01T08:42:56Z

jaytaph
Oct 1, 2023
Maintainer

I've started out with using rc<nodes>, but finally moved to an arena-like system with node-ids. The reason was mostly because the whole borrow checker was a huge pain to work with. The advantage of using node-ids instead of nodes is that it makes it easier to deal with (large) trees, as we just have to copy usize's instead of complete nodes when we sometimes needs clone of things. Also, when we actually need node-data, we can simply do a self.document.get_node_by_id(node_id). I found this system to work much easier, but this can be because of my lack of rust knowledge.

1 reply

Kiyoshika Oct 1, 2023
Collaborator Author

fair enough. Given how new I am to the language I haven't fully experienced the borrow checker pain yet. If it's much easier to work with then I'm fine with sticking to the ID system

emwalker · 2023-10-01T15:55:47Z

emwalker
Oct 1, 2023
Collaborator

Also new to Rust. I've seen two patterns mentioned for Rust when you have a large graph with shared references to nodes. Rc is one and use of ids and lookups is another. Rc should deal with the borrow checker; it's generally used to avoid issues with the borrow checker, although at the risk of having a circular reference that defeats reference counting.

Using ids to look up nodes in a map- or array-like structure is the other pattern. People seem to recommend both patterns.

Looking at Node, I think if we stick with IDs, we'll want to use a newtype wrapper around the primitive (usize) in this case.

struct NodeId(usize);

0 replies

jaytaph · 2023-10-01T18:55:20Z

jaytaph
Oct 1, 2023
Maintainer

Let's stick with nodeId's for now then. A dom- or nodetree should always carry an arena with it, so we can always fetch the actual node data when needed. If we happen to need some cloning, at least we clone large trees with node-ids instead of complete nodes with all info.

0 replies

Kiyoshika · 2023-10-01T19:38:34Z

Kiyoshika
Oct 1, 2023
Collaborator Author

Closing discussion as we're settling on ID approach for now

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLOSED] Redesigning `Document` to be used as the primary DOM tree #63

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

[CLOSED] Redesigning Document to be used as the primary DOM tree #63

Kiyoshika Sep 30, 2023 Collaborator

Replies: 4 comments · 1 reply

jaytaph Oct 1, 2023 Maintainer

Kiyoshika Oct 1, 2023 Collaborator Author

emwalker Oct 1, 2023 Collaborator

jaytaph Oct 1, 2023 Maintainer

Kiyoshika Oct 1, 2023 Collaborator Author

[CLOSED] Redesigning `Document` to be used as the primary DOM tree #63

Kiyoshika
Sep 30, 2023
Collaborator

Replies: 4 comments 1 reply

jaytaph
Oct 1, 2023
Maintainer

Kiyoshika Oct 1, 2023
Collaborator Author

emwalker
Oct 1, 2023
Collaborator

jaytaph
Oct 1, 2023
Maintainer

Kiyoshika
Oct 1, 2023
Collaborator Author