Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Edge::label from optional to required #1205

Merged
merged 28 commits into from
Feb 4, 2025

Conversation

mjoerussell
Copy link
Contributor

@mjoerussell mjoerussell commented Dec 30, 2024

This involves a couple of changes outside of just changing the field type:

  1. CursorIterator must be able to produce a label for every node now. This wasn't possible before because the root nodes of a cursor had no parent, and so could not derive a label. I've changed Cursor in the following ways to fix this:
    • Cursor::parent is a new type, Parent<T>, instead of Option<Rc<PathAncestor<T>>>. Parent<T> is an enum with three variants - None, Open(Rc<PathAncestor<T>>), and Closed(Rc<PathAncestor<T>>). A Closed parent is used when a cursor is spawned from an arbitrary non-root node, and means that the parent can be read but not traversed to. None is only used for the true root node.
  2. KindTypes::EdgeLabel is given a Default implementation, which for now just returns the first PredefinedLabel variant. This is needed in order to be able to set a "concrete" EdgeLabel value from the metaslang_cst crate.
  3. Added a Root variant to PredefinedLabel. This will be used when a node is the true root node. For now this will be the default value for KindTypes::EdgeLabel.

Closes #815

This involves a couple of changes outside of just changing the field type:
1. `CursorIterator` must be able to produce a label for every node now. This wasn't possible before because the root nodes of a cursor had no parent, and so could not derive a label. I've changed `Cursor` in the following ways to fix this:
    * `Cursor::parent` is a new type, `Parent<T>`, instead of `Option<Rc<PathAncestor<T>>>`. `Parent<T>` is an enum with three variants - `None`, `Open(Rc<PathAncestor<T>>)`, and `Closed(Rc<PathAncestor<T>>)`. A `Closed` parent is used when a cursor is spawned from an arbitrary non-root node, and means that the parent can be read but not traversed to. `None` is only used for the true root node.
2. `KindTypes::EdgeLabel` is given a `Default` implementation, which for now just returns the first `PredefinedLabel` variant. This is needed in order to be able to set a "concrete" `EdgeLabel` value from the `metaslang_cst` crate.
3. Added a `Root` variant to `PredefinedLabel`. This will be used when a node is the true root node. For now this will be the default value for `KindTypes::EdgeLabel`.
@mjoerussell mjoerussell self-assigned this Dec 30, 2024
@mjoerussell mjoerussell requested review from a team as code owners December 30, 2024 21:26
Copy link

changeset-bot bot commented Dec 30, 2024

🦋 Changeset detected

Latest commit: 07368d2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@nomicfoundation/slang Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are a few root labels in sub-trees still. For example, root꞉ Comma in
crates/solidity/testing/snapshots/cst_output/ContractMembers/separated_recovery/generated/0.6.0-failure.yml.

Maybe to catch such cases, we can add a check in crates/solidity/outputs/cargo/tests/src/cst_output/runner.rs to make sure we never put Label::default() in sub-trees. Thoughts?

let cursor = output.create_tree_cursor();
while cursor.go_to_next() {
    assert!(!cursor.label.is_default());
}

…the current node is the root and if it has the default label

* Use these utility functions in `cst_output/runner.rs` to assert that child nodes never have the default label (`root`).
@mjoerussell
Copy link
Contributor Author

I added the assertion but haven't fixed the issue, so some tests should be failing right now.

I've found some places where the Edge::root constructor is used, but I'm not sure how to determine what labels should be used instead. All of these calls to Edge::root occur in the context of building or reading a ParseResult, for example this code in lexer/mod.rs.

Using this I narrowed down the failures to three test cases:

contract_members::separated_recovery
expression::incomplete_operand
source_unit::pratt_precedence_recovery

These all occur in the context of a parsing failure, and the node given the root label is the node joining a successful match and a failure. My suspicion is that somewhere in error recovery we're creating a new subtree, and the root of that subtree is being given the root edge label. I just haven't been able to find exactly where that's happening yet.

@OmarTawfik do you have any thoughts on these?

@OmarTawfik
Copy link
Contributor

@mjoerussell Thanks for the analysis! I think there are two separate issues here:

For contract_members::separated_recovery, I wonder if it is a case of NOT applying separator_label in SeparatedHelper to the separator token when we recover?

For expression::incomplete_operand and source_unit::pratt_precedence_recovery, looks like you uncovered a bug in error recovery in the PRATT parser. We are not creating the right structure for the specific Expression variant.

For example, in incomplete_operand, we should be (idially) creating the parent Expression, with a single child MultiplicativeExpression (the specific expression variant), which would have three children:

  1. left_operand: Expression with a single DecimalNumberExpression child
  2. operand: Asterisk
  3. right_operand: Expression with a single NewExpression child

But instead, because of the current implementation, the three children are promoted to the top Expression, and the child MultiplicativeExpression only has the operator node:

  - (root꞉ Expression): # "2 * new\n" (0..8)
      - (variant꞉ DecimalNumberExpression) ► (literal꞉ DecimalLiteral): "2" # (0..1)
      - (root꞉ MultiplicativeExpression): # " *" (1..3)
          - (leading_trivia꞉ Whitespace): " " # (1..2)
          - (operator꞉ Asterisk): "*" # (2..3)
      - (variant꞉ NewExpression): # " new\n" (3..8)
          - (leading_trivia꞉ Whitespace): " " # (3..4)
          - (new_keyword꞉ NewKeyword): "new" # (4..7)
          - (trailing_trivia꞉ EndOfLine): "\n" # (7..8)
      - (missing꞉ MISSING): "" # (8..8)

I'm not sure what is the exact fix for PrecedenceHelper here, but happy to brainstorm/pair on this if you would like.

@mjoerussell
Copy link
Contributor Author

For contract_members::separated_recovery, I wonder if it is a case of NOT applying separator_label in SeparatedHelper to the separator token when we recover?

This was exactly the issue, and it was a pretty simple fix to get this case working again.

I'm not sure what is the exact fix for PrecedenceHelper here, but happy to brainstorm/pair on this if you would like.

I don't think that there's an issue with PrecedenceHelper, I think that the issue is actually in SequenceHelper. In SequenceHelper PrattOperatorMatch's get flattened when they're succeeded by a bad match. PrattElement::into_nodes creates an Edge::root when extracting the inner nodes of the element. This adds the root label that we're seeing, and it also prevents PrecedenceHelper from correctly reducing the expression.

@mjoerussell
Copy link
Contributor Author

@OmarTawfik Diving into SequenceHelper and PrattElement-related things is becoming a bigger rabbit hole than I'd expected. I'm not sure if fixing this bug is in the scope of this PR.

If it's OK with you, I can fix the current errors by forcing the flattened pratt expressions to be given a different label, perhaps Unrecognized, and we can return to the issue at a later date. Since this situation only occurs in the context of error recovery, I think that it should not impact parsing in general.

…root node of the flattened expression the label `Unrecognized` instead of `Root`.
@OmarTawfik
Copy link
Contributor

Since the current labels we produce are already inaccurate, so changing it to Unrecognized seems like an improvement, as long as this only affects invalid trees, and not correct parses.

Copy link
Contributor

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Thanks for all the hard work on this. This is a significant improvement to DX.

* Remove Cursor method `is_true_root` - we don't want this exposed, but also with the cleanups above it's no longer used
@mjoerussell mjoerussell added this pull request to the merge queue Feb 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 4, 2025
@mjoerussell mjoerussell added this pull request to the merge queue Feb 4, 2025
Merged via the queue into NomicFoundation:main with commit 103b331 Feb 4, 2025
1 check passed
@mjoerussell mjoerussell deleted the feature/required-label branch February 4, 2025 22:26
@github-actions github-actions bot mentioned this pull request Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make EdgeLabel required in the CST
2 participants