-
Notifications
You must be signed in to change notification settings - Fork 13
Normalize line endings in multiline strings #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The LSP normalises line endings here: air/crates/lsp/src/documents.rs Line 55 in 003c31b
|
I find this kind of interesting because: For LSP workflows:
For cli workflows:
So in cli workflows you can end up with what we want, which is There's a decent chance this is why biome retains the original line endings in the LSP side of things |
I've been ruminating on this for a few days and researching some things. Here's the guiding principle of what i think we should do:
We should apply this holistically throughout air. Practically, the "earliest possible input boundary" is at file read time (or file modification time in the LSP). We could do this in the lexer/parser, but I think it is going to be important for us to have a "correct" standalone The LSP effectively does this already with
Something I did not like in the example of mine above is that the previous "ideal" scenario put you in a state where you had Instead I think the ideal scenario is that air ensures that you only have exactly one type of line endings in your file. I brought up earlier that its not great that in this case "foo
bar" There is a line ending captured by R's parser, and it could be Simply set You can also combine that with a This also doesn't seem to be something that R users have struggled with in the past. Pros
Cons
Interesting historical note. Our |
I was thinking that what we want is replacing line endings everywhere, including in multiline strings which are not special in that regard, and then restoring them back on the way out? Like the LSP does.
hmm I'm not sure about this. The parser can be used in other contexts than just the cli formatter and the LSP and it seems a bit restrictive to force callers to normalize. The platform sensitive line endings can be a footgun when trivia token matter but for many potential usages the trivia token are not depended upon. I don't have a strong opinion though, I could go with the strict approach.
Is this a biome setting? Can we add |
@lionel- and I learned that, in practice, a lot of my fears around R parser behavior are unfounded Even on Windows, it looks like this: "foo
bar" does end up evaluating to
The key point is that the implementer of the It is possible for
We are not worried about this case. In practice the worst thing that would probably happen in this nightmare scenario is...exactly what would happen today. We'd normalize So in air that means that it should be totally fine for us to standardize to unix line endings on the way in, and convert to a single type of line endings on the way out. We don't have to be strict in the parser, I was just hoping that turning this into an error there would keep us from accidentally forgetting to normalize and not realizing it until a Windows user reports an issue. |
See also #69 (comment)
We have this CI failures on Windows https://github.com/posit-dev/air/actions/runs/12042018483/job/33574939950?pr=69
With this test
Which formats a file containing this string
Note that this is a multiline string. The raw string there isn't critical to the reprex, it's just what we happened to have there.
This fails because:
\r\n
inside the string contents.located_token_text()
for a string because of this no-\r assertionBoth biome and ruff use this assertion, so I imagine it is for a good reason.
The way both of them handle this is by using variants of
normalize_string()
here:https://github.com/biomejs/biome/blob/cd1c8ec4249e8df8d221393586d664537c9fddb2/crates/biome_formatter/src/token/string.rs#L48
Note that the docs say one of the things it does is replace
\r\n
with just\n
.They call this at
fmt()
time for a string node, before it actually drops through tolocated_token_text()
, so they get a chance to "fix up" the string here. We could probably create our own version of this and strip out everything except the\r\n
conversion, until we decide to do more normalization.Why do they even do this?
I imagine the idea is that typically the line ending doesn't really matter, it becomes part of the Trivia set, but is not part of an actual node. But here, the line ending gets parsed into the actual string contents. I assume they want to ensure that the actual nodes of the tree look the same across OSes regardless of line endings, and standardizing on
\n
inside multiline strings is the way to achieve that.The text was updated successfully, but these errors were encountered: