-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow whitespace to be a _terminator where possible? #24
Comments
There is some precedence for this in the "offlcial grammars" though: tree-sitter/tree-sitter-rust#70
|
@the-mikedavis thanks a lot for investigating this! Keeping newlines for the injection conceptually sounds like a good idea, but we should focus on the most problematic cases first. I'm gonna list a few to make sure I got this right. Case 1. <%= foo %><%= bar %> the Elixir code that we parse is: foo bar Case 2. <%=format(x)%>: <%=format(y)%> becomes format(x)format(y) Case 3. And similarly <%=100%><%=variable%> becomes 100variable Realistically speaking case 1. is the most concerning, because it's very likely to appear. And by gluing the expressions together, we would always highlight one as function and the other as the argument, which is inconsistent. It's also possible that someone doesn't use whitespace inside EEx interpolation (cases 2. and 3.), so then it doesn't matter how strict/loose we are about whitespace/terminators, we're gonna get invalid syntax. The idea of parsing injections in groups sounds like a good direction (assuming it would be possible upstream), however then there is a question if we can group the expressions in a robust way. I'm not sure if we can cover everything with simple checks like This seems to be a very general topic with respect to injections, so cc-ing @maxbrunsfeld for any high-level thoughts and to validate if grouping injections seem viable to implement in tree-sitter. |
HEEx is a templating engine on top of Elixir's EEx templating language specific to HTML that is included in Phoenix.LiveView (though I think the plan is to eventually include it in base Phoenix). It's a superset of EEx with some additional features like components and slots. The injections don't work perfectly because the Elixir grammar is newline sensitive (the _terminator rule). See elixir-lang/tree-sitter-elixir#24 for more information.
HEEx is a templating engine on top of Elixir's EEx templating language specific to HTML that is included in Phoenix.LiveView (though I think the plan is to eventually include it in base Phoenix). It's a superset of EEx with some additional features like components and slots. The injections don't work perfectly because the Elixir grammar is newline sensitive (the _terminator rule). See elixir-lang/tree-sitter-elixir#24 for more information.
HEEx is a templating engine on top of Elixir's EEx templating language specific to HTML that is included in Phoenix.LiveView (though I think the plan is to eventually include it in base Phoenix). It's a superset of EEx with some additional features like components and slots. The injections don't work perfectly because the Elixir grammar is newline sensitive (the _terminator rule). See elixir-lang/tree-sitter-elixir#24 for more information.
I've been fiddling with the (H)EEx queries lately trying to find something that works really well. There are some tough cases that I think might be solveable with changes to this grammar.
The main issue is combined injections. Consider some EEx like so:
Here lines 1 and 3 need
injection.combined
(like with tree-sitter-iex) so that theend
is parsed as part of the same document as thefn .. ->
. Line 2 can be parsed regularly (no combined injections). For the combined parts, this looks like so to tree-sitter-elixir:That case works pretty well with the current grammar and queries in tree-sitter-eex, but
injection.combined
introduces some problems. Consider this EEx:With the current grammar and queries, this produces an
(ERROR)
on the secondEnum
. This happens because the EEx grammar is consuming the newline. To tree-sitter-elixir, this example looks like:How does tree-sitter-embedded-template fix this?
In this example of an ejs template from the expressjs repo,
All of the inner JavaScript contents are
injection.combined
, so this comes out like so for tree-sitter-javascriptWhich tree-sitter-javascript parses as if it were valid JavaScript (which, to my knowledge, it is not). This happens in tree-sitter-ruby as well with erb templates: the grammars are intentionally more permissive around newlines and whitespace than the language actually allows.
Possible solutions
I see a few ways around this to make the EEx injections work as well as ejs/erb.
The first is to try to allow spaces to be
$._terminator
s whenever it doesn't introduce abiguity. I tried some minor edits to the grammar and it seems like this approach is easier said than done because of things like function calls without parens. This would also sacrifice the current parity between this parser and the parser in the Elixir compiler, which seems a bit distasteful to me.The second would be to propose a change to injections upstream in tree-sitter to try to introduce a way to limit the scope of a combined injection. Currently all injections which are marked as combined globally in a document are parsed together in one "combined" sub-document. If the EEx grammar were rewritten as in this branch to group EEx tags based on start/middle-expressions and end-expressions, and then if we could combine only the tags within each group, then we wouldn't need to care about
$._terminator
s at all. In the example above, tags one and two would be grouped and three and four grouped, but only tags one and two woud be combined with one another, and tags three and four with one another. This is nice because it mimics the EEx.Tokenizer, but it's unclear if a change like this would be accepted upstream in tree-sitter since no other language has needed anything like it.Thirdly there's a way to fix this particular case by parsing newlines in the eex grammar and handing them over to the elixir grammar when injecting (see here) but it's not general enough to handle some odd cases, like if the example above had all four tags on the same line.
The last resort as I see it would be to rewrite the EEx grammar to depend on the Elixir grammar, so it can have fine-grained control over how expressions within EEx tags are terminated. This seems potentially brittle and certainly more of a maintenance burden, though, so I think this should be avoided if possible.
What do you think @jonatanklosko?
see also connorlay/tree-sitter-eex#1
connects #2
The text was updated successfully, but these errors were encountered: