Add support for raw identifiers (plus some stuff) #88

ccasin · 2024-10-31T21:02:38Z

The goal here was to add support for raw identifiers (let \#let = 42). Along the way, it was necessary to update the parser to reflect the current head of flambda-backend (this is needed because we need the 5.2 parser change for raw identifiers, and required some small script changes because of pivot-root).

This is best reviewed commit by commit:

The first commit updates the import script for pivot-root, fixes a bug where it would delete a dune file, and updates it to also import parser_types.{ml,mli}.
The second commit runs the script. Don't review this.
The third commit gets the existing tests passing by fixing issues with the automated merge and updating the styling logic for parsetree changes (e.g., the fact that signatures are no longer just a list). Note that we've pulled in some PRs that move things (particularly layouts/jkinds) from using jane-syntax to being proper members of the parsetree. This necessitates changes to the parts of Normalize_std_ast.ml that deal with jane syntax and the mode where we erase it - some parts dealing with attributes get simpler, but also many new cases are needed for the new syntax nodes.
The fourth commit actually adds support for raw identifiers, porting over the lexer changes to parser-extended and updating the styling. More on that below.
The fifth commit is some small documentation updates.
edit: And the sixth commit updates the CI to use ocaml 5.2, which is required after this PR.

How to print raw identifiers

The parsetree does not record whether a raw identifier was used. We could change that, but for now I adopted the same approach as the compiler's pprintast.ml, which is to check before printing an identifier whether it is a keyword and escape it if so.

This is complicated by the fact that there isn't one centralized place where the formatter prints identifiers. What I did was to look at every place it prints a string using str and figure out if it's a string that is an identifier which can legally be a raw identifier. I added new functions for printing these identifiers - thin wrappers around the existing functions that add the escape sequence if the identifier is a keyword. In one case (fmt_longident) it seemed easier to just parameterize the function by whether the callsite is printing a variant constructor or not, as I believe those are the only things it is called on that can't be escaped raw identifiers. I think it is probably not worth your time to review each use of str in detail, but up to you.

I added a simple test, which is just the compiler's test for raw identifiers extended with some bits of jane street syntax that can now use them.

Review: @dvulakh probably

dvulakh

The normalization logic looks good to me (and I'll test on the tree before doing a second review pass).

I've noted some bugs in the treatment of escaped identifiers. I'll do a more thorough review pass once those are fixed and the ~constructor argument to the identifier formatting functions is deleted (which should make the diff easier to read).

vendor/parser-extended/parse.ml

vendor/parser-standard/parser.mly

lib/Normalize_std_ast.ml

lib/Fmt_ast.ml

ccasin · 2024-11-05T19:23:20Z

@dvulakh this is ready for another look. I've also rebased over the merged PR we discussed.

dvulakh

Here are two more fun programs:

module type \#sig = sig end

module M = struct let \#mod = 1 end

let _ = M.\#mod

ccasin · 2024-11-08T15:20:11Z

Here are two more fun programs:

module type \#sig = sig end

module M = struct let \#mod = 1 end

let _ = M.\#mod

Nice catches. I've fixed these now, and testing on the feature you prepared suggests this is the last of it (that we can easily catch now).

Note that I've arranged it so your examples using infix symbols get printed with parens rather than a raw ident escape. So, for example,

let _ = M.\#mod

becomes

let _ = M.( mod )

And this required a small change in Fmt_ast.fmt_longident

ccasin · 2024-11-20T18:25:20Z

Closing this. It did result in an upstream compiler bug report and two PRs with raw identifier support in upstream ocamlformat (here's the one that looks like it will be merged). But as it looks like upstream ocamlformat is going with a different (probably better) approach, we should just plan to backport that.

I'll try to pull out the bits of this that are just updating this repo for the latest parsetree and pivot-root in a separate PR (though it may also make sense to redo some of that work now that jane syntax is fully gone)

ccasin requested a review from dvulakh October 31, 2024 21:02

dvulakh requested changes Nov 4, 2024

View reviewed changes

ccasin added 8 commits November 5, 2024 14:21

Update scripts for pivot-root (plus fix dune file and parser_types)

bfcea7f

Run scripts (doesn't build)

21bf2bd

Get things working after merge from flambda-backend

cc4d3f8

support for raw identifiers

9a0ca15

Doc updates

5a99cb4

Change workflow files to use ocaml 5.2

be5c8a6

Review feedback: docs and nits

cd92f15

Integrate review: bugs

d91bfc4

ccasin force-pushed the raw-identifiers branch from 001918d to d91bfc4 Compare November 5, 2024 19:22

dvulakh requested changes Nov 6, 2024

View reviewed changes

More bugs

6372559

tdelvecchio-jsc mentioned this pull request Nov 12, 2024

Fix attributes being dropped from packed module expressions and patterns #89

Merged

Julow mentioned this pull request Nov 14, 2024

5.2 support: Raw identifiers ocaml-ppx/ocamlformat#2619

Closed

ccasin closed this Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for raw identifiers (plus some stuff) #88

Add support for raw identifiers (plus some stuff) #88

ccasin commented Oct 31, 2024 •

edited

Loading

dvulakh left a comment

ccasin commented Nov 5, 2024

dvulakh left a comment

ccasin commented Nov 8, 2024

ccasin commented Nov 20, 2024

Add support for raw identifiers (plus some stuff) #88

Add support for raw identifiers (plus some stuff) #88

Conversation

ccasin commented Oct 31, 2024 • edited Loading

How to print raw identifiers

dvulakh left a comment

Choose a reason for hiding this comment

ccasin commented Nov 5, 2024

dvulakh left a comment

Choose a reason for hiding this comment

ccasin commented Nov 8, 2024

ccasin commented Nov 20, 2024

ccasin commented Oct 31, 2024 •

edited

Loading