Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for raw identifiers (plus some stuff) #88

Closed
wants to merge 9 commits into from

Conversation

ccasin
Copy link

@ccasin ccasin commented Oct 31, 2024

The goal here was to add support for raw identifiers (let \#let = 42). Along the way, it was necessary to update the parser to reflect the current head of flambda-backend (this is needed because we need the 5.2 parser change for raw identifiers, and required some small script changes because of pivot-root).

This is best reviewed commit by commit:

  • The first commit updates the import script for pivot-root, fixes a bug where it would delete a dune file, and updates it to also import parser_types.{ml,mli}.
  • The second commit runs the script. Don't review this.
  • The third commit gets the existing tests passing by fixing issues with the automated merge and updating the styling logic for parsetree changes (e.g., the fact that signatures are no longer just a list). Note that we've pulled in some PRs that move things (particularly layouts/jkinds) from using jane-syntax to being proper members of the parsetree. This necessitates changes to the parts of Normalize_std_ast.ml that deal with jane syntax and the mode where we erase it - some parts dealing with attributes get simpler, but also many new cases are needed for the new syntax nodes.
  • The fourth commit actually adds support for raw identifiers, porting over the lexer changes to parser-extended and updating the styling. More on that below.
  • The fifth commit is some small documentation updates.
  • edit: And the sixth commit updates the CI to use ocaml 5.2, which is required after this PR.

How to print raw identifiers

The parsetree does not record whether a raw identifier was used. We could change that, but for now I adopted the same approach as the compiler's pprintast.ml, which is to check before printing an identifier whether it is a keyword and escape it if so.

This is complicated by the fact that there isn't one centralized place where the formatter prints identifiers. What I did was to look at every place it prints a string using str and figure out if it's a string that is an identifier which can legally be a raw identifier. I added new functions for printing these identifiers - thin wrappers around the existing functions that add the escape sequence if the identifier is a keyword. In one case (fmt_longident) it seemed easier to just parameterize the function by whether the callsite is printing a variant constructor or not, as I believe those are the only things it is called on that can't be escaped raw identifiers. I think it is probably not worth your time to review each use of str in detail, but up to you.

I added a simple test, which is just the compiler's test for raw identifiers extended with some bits of jane street syntax that can now use them.

Review: @dvulakh probably

@ccasin ccasin requested a review from dvulakh October 31, 2024 21:02
Copy link

@dvulakh dvulakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalization logic looks good to me (and I'll test on the tree before doing a second review pass).

I've noted some bugs in the treatment of escaped identifiers. I'll do a more thorough review pass once those are fixed and the ~constructor argument to the identifier formatting functions is deleted (which should make the diff easier to read).

vendor/parser-extended/parse.ml Outdated Show resolved Hide resolved
vendor/parser-standard/parser.mly Outdated Show resolved Hide resolved
lib/Normalize_std_ast.ml Show resolved Hide resolved
lib/Fmt_ast.ml Show resolved Hide resolved
lib/Fmt_ast.ml Show resolved Hide resolved
lib/Fmt_ast.ml Outdated Show resolved Hide resolved
@ccasin
Copy link
Author

ccasin commented Nov 5, 2024

@dvulakh this is ready for another look. I've also rebased over the merged PR we discussed.

Copy link

@dvulakh dvulakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are two more fun programs:

module type \#sig = sig end
module M = struct let \#mod = 1 end

let _ = M.\#mod

@ccasin
Copy link
Author

ccasin commented Nov 8, 2024

Here are two more fun programs:

module type \#sig = sig end
module M = struct let \#mod = 1 end

let _ = M.\#mod

Nice catches. I've fixed these now, and testing on the feature you prepared suggests this is the last of it (that we can easily catch now).

Note that I've arranged it so your examples using infix symbols get printed with parens rather than a raw ident escape. So, for example,

let _ = M.\#mod

becomes

let _ = M.( mod )

And this required a small change in Fmt_ast.fmt_longident

@ccasin
Copy link
Author

ccasin commented Nov 20, 2024

Closing this. It did result in an upstream compiler bug report and two PRs with raw identifier support in upstream ocamlformat (here's the one that looks like it will be merged). But as it looks like upstream ocamlformat is going with a different (probably better) approach, we should just plan to backport that.

I'll try to pull out the bits of this that are just updating this repo for the latest parsetree and pivot-root in a separate PR (though it may also make sense to redo some of that work now that jane syntax is fully gone)

@ccasin ccasin closed this Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants