-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework incorrect indentation recovery #1273
Comments
I think tradeoff of introducing this kind of breaking change in new language versions is well worth it. |
The problem is, I've seen beginner code with literally many of these warnings, and the warning turned off. I started searching for stuff in github that turns of warning 58, here are two examples:
e.g. https://github.com/search?q=FS0058&type=code I'd expect those projects don't specify language version. An alternative might be to have a flag In short I'm happy for this warning to become an error, and the grammar improved to make use of that, subject to this kind of concern about being a breaking change. If we need to duplciate the parser to implement that, we should also ditch all the non-indentation-aware syntax rules from the grammar, and all long-deprecated features (if giving errors), which will make it much simpler. |
Slightly tweaked query that only includes fsproj, props, and targets file: https://github.com/search?q=FS0058+%28path%3A*.fsproj+OR+path%3A*.props+OR+path%3A*.targets%29&type=code There are only 20 occurrences here, and the majority of those are in two projects - Pulumi F# extensions and the CoFlows family of projects already linked. That doesn't seem like an insurmountable hurdle to overcome. |
I think we should really follow this path for this one:
There will 100% be significant enterprise code that emits and suppresses this warning. We can't have installing new versions of the F# compiler or .NET SDK just randomly start giving a hard error for suppressed instances of this warning, or indeed even for unsuppressed instances. |
If we don't enable it by default in existing projects, then it seems there would be much less point in improving it at all. From the tooling standpoint we want a better parser to have the broadest reach possible. We can, for sure, provide a way to opt-out and to revert to the current warnings, if for some reason it's not easy to correct wrong indentation when the SDK is released. |
I mean I guess an alternative would be to parse all sources twice, and see if the warning occurs anywhere. If not then go ahead with the better parser. Anyway, it's ultimately up to @vzarytovskii, @baronfel and others to decide policy on this sort of thing. I'm OK with the change but it's a breaking change for sure, and it will hit companies where the developers who wrote the code are no longer even at the company. |
Yeah, my fear is that such code can be in a bunch of closed codebases, which "just work". |
Super valid points. Templates are a great way to kickstart adoption of features like this, and we also have knobs like defaulting the new parser to off for .NET 8, and immediately in 9 preview 1 builds turning the parser on, that way we have the entire 9 preview cycle to find and react to bugs or impacts in the tech. I pinged Kathleen so we could chat about what ideas for what our overall strategy for these kinds of changes should be (which fits nicely into a "rules that folks who contribute to the .NET SDK should follow for breaking changes and diagnostics" thing I'm working on). |
@auduchinok I wonder how hard it will be to hide new recovery path under a flag? |
Should be easy, since it's going to be a language version dependent change anyway. LexFilter changes will look at the flag or language version and control this, just like existing version-dependent changes there, while the new parser rules just won't be triggered in the most (if not all) cases. The parser rules are going to be just new recovery rules, like the ones I've been adding recently. |
I should say that I'm biased, but I'll try to reason about my point of view. I understand that breaking some code in closed source codebases is not good. However, the code affected should be rare. On top of that, the fix for most cases should be trivial and will only require adding spaces in, hopefully, a few places. Alternatively, developers could set a language version in a project during the transition period. On the other hand, with this change in the language, we can significantly improve the quality of the editor features for the vast majority of F# users, starting with the new parser recovery rules and then incrementally improving features like Code Completion. My only concern is that doing this will require an update in some (possibly unmaintained) codebases, and that code will need a fix, regardless of whether that happens in F# 8 or F# 9. Postponing it to F# 9 doesn't help enterprises overcome this issue; it only postpones the inevitable. Not enabling it by default in existing projects at some point isn't the way to go, either. I want to work on this change, and if we consider the quality to be suitable by the feature freeze time, I think it could make sense to consider enabling it in .NET 8, as it seems the benefits of improving the tooling outweigh the possible downsides, which are easy enough to overcome. However, if we see that there're major issues need resolving by then, I'll be happy to proceed with it for the .NET 9 release instead. In the best case, we'll be able to ship the change this year, and there will be an opportunity to improve features further in subsequent releases of the tools, enabling us to offer better tooling support by the .NET 9 release time. On the other hand, if we postpone it to .NET 9, most users won't get these improvements for another 1.5 years, which seems like a long time for the language tooling. |
FWIW I would be in favor of:
I think we should also weigh the disruption to some against other kinds of disruptions. I have no doubt that we've done many "acceptable" things, especially in tools, that are far more disruptive than something like this. That doesn't excuse this change from needing to be done responsibly, but if the net benefit is that newcomers to the language can just type code like they expect and it usually "just works" then I think that's big also for these larger, closed-source enterprise codebases where people need to maintain code they didn't write. |
Alternatively, we could consider revising the rule that if a language version is not specified, then the latest is used. And instead, use v7 -- which would be the last version that had this policy -- and show a warning or lint that you should specify a language version. |
We want people to use latest version if its not specified, since it's usually contains a whole bunch of diagnostics, better codegen, new features. It's better to have a separate flag for this feature. |
It seems the best way to do this would be to have a flag and to enable/disable it by default based on the language version. This way people using the new SDK will get the changes automatically, while it will still be possible to disable it, or to enable it if an older language version is used explicitly in the project file for some reason. |
@dsyme @vzarytovskii If you agree with this proposal, could it get an approval, please? If there're no objections to trying to get it into .NET 8, then I'll start working on it. |
I'll mark it as approved. By approving I mean if the feature applies then this strawman (please let's capture this very precisely):
Now I think of it, duplicating the parser is actually problematic because the token definitions all get duplicated. That would require a change in FsLexYacc. Ugh. I really feel this needs it though.... @auduchinok Do you know a way that would avoid this duplication while retaining certainty that we haven't mucked with parsing of old code? |
On the parser side, this change should only require adding/changing recovery rules, like the ones I've been adding recently, not changing normal case ones. It should have no more effect on normal rules than lots of recently added recovery ones. I think splitting the parser into the old and the new ones can be good in the long term but should probably be done separately, as it's really orthogonal to the indentation recovery I'm proposing. This task may be big enough by itself even without this additional refactoring, as we'll need to make sure all the IDE features work with the new recovery. Let's maybe make it a separate change? 🙂 |
Love the idea of completing the sunsetting of non-light syntax and ML compatibility. |
But don't those rules apply unconditionally? Making it impossible for that parser to parse like it did previously? |
Yes, they should, but we have control over what's inserted by |
Among the obsolete rules that could be flushed by a parser clean slate, #806 is also a candidate. Infix operator de-indentation was considered a misfeature by everyone in the conversation and it bites people regularly. Don approved adding a warning on it; but while we're making breaking changes to the parser, we could even remove it altogether. |
I've implemented the new rules for some of the most common cases in dotnet/fsharp#15334, and it improved parsing recovery significantly, fixing many issues with highlighting/folding/lenses during typing, like reported in dotnet/fsharp#7078. With this change places where code gets deindented act as syncronization points, and it allows producing Here're some examples: let _ =
if true then
() // warning let _ =
if true then () else
() // no warning, so it's possible to write early returns without additional indentation let _ =
match () with
| _ ->
() // no warning, for a similar reason These cases deliberately allow code to not have additional indentation, and it makes the usage of the language much better in some scenarios. Fortunately, the most of such cases normally appear inside other expressions/declarations, so the outer indentation block still help us have better recovery with the new rules. Unfortunately, there're cases where the language design doesn't seem to be consistent, and some cases don't require additional indentation, despite code below often becomes impossible to be parsed (in some expected way or at all): type T =
member this.Prop = 1 // a warning is produced (an error with the new rules) type T with
member this.Prop = 1 // no warning Not having indentation here makes it impossible to properly parse things below: type T with
member this.Prop = 1
let x = 1 // error, this is considered a part of the type augmentation Subsequently, it doesn't parse things as expected while typing too, even if you're going to have the proper indentation: type T with
let x = 1 // errors here and below, until a member with proper indentation is typed above There're other similar cases: type T() =
do
() // no warning
member this.Prop = 1 // error, this is parsed as a part of the `do` expression And it breaks for unfinished code too, making the user experience much worse: type T() =
do
// errors here and below
// this is unsuccessfully parsed as a part of the `do` expression
member this.Prop = 1 With all this considered, I propose that we normalize the indentation requirements where it makes sense:
We can discuss the particular cases where it makes sense to change the behaviour, and I'll try to start normalizing the requirements in dotnet/fsharp#15102 and other PRs. |
There're also less critical inconsistent cases: let _ =
if true then
() // warning let _ =
while true do
() // no warning Expressions like |
F# has indentation-based syntax powered by offside rules. They are implemented in a component called
LexFilter
that works between lexing and parsing stages. LexFilter keeps track of the indentation blocks and inserts additionalbegin
- andend
-like tokens for the parser.To help newcomers write F# code, LexFilter has a recovery rule that is used when a token is insufficiently indented. This rule produces a warning and starts a new indentation block anyway:
While the warning is good, the insertion of the block causes many problems for the parsing and analysis when writing code. Consider a case where a user is writing a new type member:
There are subsequent definitions and expressions, and all of them are parsed and analyzed correctly.
As soon as the user types
=
in, the LexFilter begins to expect an indented member body below, doesn't find it, and applies the recovery rule:That makes all the code below parsed incorrectly as the body of the property. Very often it makes the parser fail to parse the file at all. In the example above, the lenses below the edited place are gone. They will reappear when you type an expression after
=
, making the code move up and down and blink. There's an unexpected error about attributes inside an expression. TheP
type is inferred incorrectly.The bigger a file gets, the more unexpected parser/analysis errors appear, because neither the parser nor the type checker expect arbitrary code below to be parsed as the indented block. This leads to blinking highlightings, moving code in editor (because of appearing/disappearing lenses/hints), and features like Code Completion or Parameter Info (Signature Help) become significantly less precise, which is crucial when editing code.
There're many cases where it breaks deep inside expressions, like in the binary expression here:
I propose we rework how wrong indentation is handled. It's possible to change LexFilter, so it doesn't start a new indentation block with the wrong indentation, and to introduce new parser recovery rules, so it would show errors about missing expressions/definitions and continue parsing.
Pros and Cons
The advantages of making this adjustment to F# are
Doing that will improve parser recovery, which in turn make the analysis significantly more stable and precise. It can improve the editing experience a lot, since there will be much less blinking, code jumping, and features that rely on inferred types will be much more stable.
The disadvantages of making this adjustment to F# are
This is a breaking change, and some code that can be compiled today will no longer compile. Such code produces warnings today and should be adjusted to have proper indentation. I expect such code to be extremely rare and mostly produced by generators (so hopefully only generators will need updates).
Extra information
Estimated cost (XS, S, M, L, XL, XXL): M-L
I've made a small prototype with an additional parser rule for
let
bindings, and it shows that this recovery works as expected, with an additional error produced, see this gist. The error texts can be improved separately. The main thing it shows is that the subsequent definitions are parsed correctly.Since it's a breaking change, we should only introduce it in a new language version. It's possible to make it into F# 8 now, so the tooling will be able to use the improvements in the fall, when the new language version is released. If we don't ship it in F# 8, the closest date when improvements could be used would probably be ~November 2024, i.e. when .NET 9 is released.
Affidavit (please submit!)
Please tick this by placing a cross in the box:
Please tick all that apply:
For Readers
If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.
The text was updated successfully, but these errors were encountered: