Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework incorrect indentation recovery #1273

Open
3 of 5 tasks
auduchinok opened this issue May 23, 2023 · 24 comments
Open
3 of 5 tasks

Rework incorrect indentation recovery #1273

auduchinok opened this issue May 23, 2023 · 24 comments

Comments

@auduchinok
Copy link

auduchinok commented May 23, 2023

F# has indentation-based syntax powered by offside rules. They are implemented in a component called LexFilter that works between lexing and parsing stages. LexFilter keeps track of the indentation blocks and inserts additional begin- and end-like tokens for the parser.

To help newcomers write F# code, LexFilter has a recovery rule that is used when a token is insufficiently indented. This rule produces a warning and starts a new indentation block anyway:

Screenshot 2023-05-23 at 12 42 16

While the warning is good, the insertion of the block causes many problems for the parsing and analysis when writing code. Consider a case where a user is writing a new type member:

Screenshot 2023-04-14 at 11 13 34

There are subsequent definitions and expressions, and all of them are parsed and analyzed correctly.
As soon as the user types = in, the LexFilter begins to expect an indented member body below, doesn't find it, and applies the recovery rule:

Screenshot 2023-04-14 at 11 13 39

That makes all the code below parsed incorrectly as the body of the property. Very often it makes the parser fail to parse the file at all. In the example above, the lenses below the edited place are gone. They will reappear when you type an expression after =, making the code move up and down and blink. There's an unexpected error about attributes inside an expression. The P type is inferred incorrectly.

The bigger a file gets, the more unexpected parser/analysis errors appear, because neither the parser nor the type checker expect arbitrary code below to be parsed as the indented block. This leads to blinking highlightings, moving code in editor (because of appearing/disappearing lenses/hints), and features like Code Completion or Parameter Info (Signature Help) become significantly less precise, which is crucial when editing code.

There're many cases where it breaks deep inside expressions, like in the binary expression here:

let a =
    1 +

let b = 1

I propose we rework how wrong indentation is handled. It's possible to change LexFilter, so it doesn't start a new indentation block with the wrong indentation, and to introduce new parser recovery rules, so it would show errors about missing expressions/definitions and continue parsing.

Pros and Cons

The advantages of making this adjustment to F# are

Doing that will improve parser recovery, which in turn make the analysis significantly more stable and precise. It can improve the editing experience a lot, since there will be much less blinking, code jumping, and features that rely on inferred types will be much more stable.

The disadvantages of making this adjustment to F# are

This is a breaking change, and some code that can be compiled today will no longer compile. Such code produces warnings today and should be adjusted to have proper indentation. I expect such code to be extremely rare and mostly produced by generators (so hopefully only generators will need updates).

Extra information

Estimated cost (XS, S, M, L, XL, XXL): M-L

I've made a small prototype with an additional parser rule for let bindings, and it shows that this recovery works as expected, with an additional error produced, see this gist. The error texts can be improved separately. The main thing it shows is that the subsequent definitions are parsed correctly.

Since it's a breaking change, we should only introduce it in a new language version. It's possible to make it into F# 8 now, so the tooling will be able to use the improvements in the fall, when the new language version is released. If we don't ship it in F# 8, the closest date when improvements could be used would probably be ~November 2024, i.e. when .NET 9 is released.

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.

@vzarytovskii
Copy link

I think tradeoff of introducing this kind of breaking change in new language versions is well worth it.

@dsyme
Copy link
Collaborator

dsyme commented May 23, 2023

The problem is, I've seen beginner code with literally many of these warnings, and the warning turned off.

I started searching for stuff in github that turns of warning 58, here are two examples:

e.g. https://github.com/search?q=FS0058&type=code

I'd expect those projects don't specify language version. An alternative might be to have a flag --strict:SOME-NUMBER that is turned on by default in new project templates.

In short I'm happy for this warning to become an error, and the grammar improved to make use of that, subject to this kind of concern about being a breaking change.

If we need to duplciate the parser to implement that, we should also ditch all the non-indentation-aware syntax rules from the grammar, and all long-deprecated features (if giving errors), which will make it much simpler.

@baronfel
Copy link
Contributor

Slightly tweaked query that only includes fsproj, props, and targets file: https://github.com/search?q=FS0058+%28path%3A*.fsproj+OR+path%3A*.props+OR+path%3A*.targets%29&type=code

There are only 20 occurrences here, and the majority of those are in two projects - Pulumi F# extensions and the CoFlows family of projects already linked. That doesn't seem like an insurmountable hurdle to overcome.

@dsyme
Copy link
Collaborator

dsyme commented May 24, 2023

I think we should really follow this path for this one:

....have a flag --strict:SOME-FEATURE-NAME that is turned on by default in new project templates....

There will 100% be significant enterprise code that emits and suppresses this warning. We can't have installing new versions of the F# compiler or .NET SDK just randomly start giving a hard error for suppressed instances of this warning, or indeed even for unsuppressed instances.

@auduchinok
Copy link
Author

auduchinok commented May 24, 2023

We can't have installing new versions of the F# compiler or .NET SDK just randomly start giving a hard error for suppressed instances of this warning, or indeed even for unsuppressed instances.

If we don't enable it by default in existing projects, then it seems there would be much less point in improving it at all. From the tooling standpoint we want a better parser to have the broadest reach possible. We can, for sure, provide a way to opt-out and to revert to the current warnings, if for some reason it's not easy to correct wrong indentation when the SDK is released.

@dsyme
Copy link
Collaborator

dsyme commented May 24, 2023

I mean I guess an alternative would be to parse all sources twice, and see if the warning occurs anywhere. If not then go ahead with the better parser.

Anyway, it's ultimately up to @vzarytovskii, @baronfel and others to decide policy on this sort of thing. I'm OK with the change but it's a breaking change for sure, and it will hit companies where the developers who wrote the code are no longer even at the company.

@vzarytovskii
Copy link

vzarytovskii commented May 24, 2023

I mean I guess an alternative would be to parse all sources twice, and see if the warning occurs anywhere. If not then go ahead with the better parser.

Anyway, it's ultimately up to @vzarytovskii, @baronfel and others to decide policy on this sort of thing. I'm OK with the change but it's a breaking change for sure, and it will hit companies where the developers who wrote the code are no longer even at the company.

Yeah, my fear is that such code can be in a bunch of closed codebases, which "just work".

@baronfel
Copy link
Contributor

baronfel commented May 24, 2023

Super valid points. Templates are a great way to kickstart adoption of features like this, and we also have knobs like defaulting the new parser to off for .NET 8, and immediately in 9 preview 1 builds turning the parser on, that way we have the entire 9 preview cycle to find and react to bugs or impacts in the tech. I pinged Kathleen so we could chat about what ideas for what our overall strategy for these kinds of changes should be (which fits nicely into a "rules that folks who contribute to the .NET SDK should follow for breaking changes and diagnostics" thing I'm working on).

@vzarytovskii
Copy link

@auduchinok I wonder how hard it will be to hide new recovery path under a flag?

@auduchinok
Copy link
Author

auduchinok commented May 24, 2023

@auduchinok I wonder how hard it will be to hide new recovery path under a flag?

Should be easy, since it's going to be a language version dependent change anyway. LexFilter changes will look at the flag or language version and control this, just like existing version-dependent changes there, while the new parser rules just won't be triggered in the most (if not all) cases. The parser rules are going to be just new recovery rules, like the ones I've been adding recently.

@auduchinok
Copy link
Author

auduchinok commented May 25, 2023

I'm OK with the change but it's a breaking change for sure, and it will hit companies where the developers who wrote the code are no longer even at the company.

Templates are a great way to kickstart adoption of features like this, and we also have knobs like defaulting the new parser to off for .NET 8, and immediately in 9 preview 1 builds turning the parser on, that way we have the entire 9 preview cycle to find and react to bugs or impacts in the tech.

I should say that I'm biased, but I'll try to reason about my point of view.

I understand that breaking some code in closed source codebases is not good. However, the code affected should be rare. On top of that, the fix for most cases should be trivial and will only require adding spaces in, hopefully, a few places. Alternatively, developers could set a language version in a project during the transition period.

On the other hand, with this change in the language, we can significantly improve the quality of the editor features for the vast majority of F# users, starting with the new parser recovery rules and then incrementally improving features like Code Completion.

My only concern is that doing this will require an update in some (possibly unmaintained) codebases, and that code will need a fix, regardless of whether that happens in F# 8 or F# 9. Postponing it to F# 9 doesn't help enterprises overcome this issue; it only postpones the inevitable. Not enabling it by default in existing projects at some point isn't the way to go, either.

I want to work on this change, and if we consider the quality to be suitable by the feature freeze time, I think it could make sense to consider enabling it in .NET 8, as it seems the benefits of improving the tooling outweigh the possible downsides, which are easy enough to overcome. However, if we see that there're major issues need resolving by then, I'll be happy to proceed with it for the .NET 9 release instead.

In the best case, we'll be able to ship the change this year, and there will be an opportunity to improve features further in subsequent releases of the tools, enabling us to offer better tooling support by the .NET 9 release time. On the other hand, if we postpone it to .NET 9, most users won't get these improvements for another 1.5 years, which seems like a long time for the language tooling.

@cartermp
Copy link
Member

FWIW I would be in favor of:

  • Enable a bunch of these fixes by default for everyone
  • Have a straightforward setting in project files to go back to old behavior (and VS tools checkbox in project settings for that), then document this and include some message about it in the error message

I think we should also weigh the disruption to some against other kinds of disruptions. I have no doubt that we've done many "acceptable" things, especially in tools, that are far more disruptive than something like this.

That doesn't excuse this change from needing to be done responsibly, but if the net benefit is that newcomers to the language can just type code like they expect and it usually "just works" then I think that's big also for these larger, closed-source enterprise codebases where people need to maintain code they didn't write.

@Tarmil
Copy link

Tarmil commented May 27, 2023

I think we should really follow this path for this one:

....have a flag --strict:SOME-FEATURE-NAME that is turned on by default in new project templates....

There will 100% be significant enterprise code that emits and suppresses this warning. We can't have installing new versions of the F# compiler or .NET SDK just randomly start giving a hard error for suppressed instances of this warning, or indeed even for unsuppressed instances.

Alternatively, we could consider revising the rule that if a language version is not specified, then the latest is used. And instead, use v7 -- which would be the last version that had this policy -- and show a warning or lint that you should specify a language version.

@vzarytovskii
Copy link

I think we should really follow this path for this one:

....have a flag --strict:SOME-FEATURE-NAME that is turned on by default in new project templates....

There will 100% be significant enterprise code that emits and suppresses this warning. We can't have installing new versions of the F# compiler or .NET SDK just randomly start giving a hard error for suppressed instances of this warning, or indeed even for unsuppressed instances.

Alternatively, we could consider revising the rule that if a language version is not specified, then the latest is used. And instead, use v7 -- which would be the last version that had this policy -- and show a warning or lint that you should specify a language version.

We want people to use latest version if its not specified, since it's usually contains a whole bunch of diagnostics, better codegen, new features.

It's better to have a separate flag for this feature.

@auduchinok
Copy link
Author

We want people to use latest version if its not specified, since it's usually contains a whole bunch of diagnostics, better codegen, new features.
It's better to have a separate flag for this feature.

It seems the best way to do this would be to have a flag and to enable/disable it by default based on the language version. This way people using the new SDK will get the changes automatically, while it will still be possible to disable it, or to enable it if an older language version is used explicitly in the project file for some reason.

@auduchinok
Copy link
Author

@dsyme @vzarytovskii If you agree with this proposal, could it get an approval, please?

If there're no objections to trying to get it into .NET 8, then I'll start working on it.

@dsyme
Copy link
Collaborator

dsyme commented May 30, 2023

I'll mark it as approved.

By approving I mean if the feature applies then this strawman (please let's capture this very precisely):

  • make the "offside warning" into a hard error
  • make the proposed adjustment to the grammar to improve indentation recovery
  • implement this by duplicating the entire parser
  • take advantage of this duplication to also completely flush all the non-indentation-aware rules from the copy of the parser
  • also flush any other rules that give hard errors on long deprecated features
  • put the whole thing in preview so we can perhaps pile in some other changes, since we don't get to do this often
  • if langversion < PREVIEW the old parser is used
  • I'd favour using the "immediately on-by-default-at-next-major-upgrade-of-.NET-SDK" mechanism @baronfel mentioned to get this to be the default behaviour.

Now I think of it, duplicating the parser is actually problematic because the token definitions all get duplicated. That would require a change in FsLexYacc. Ugh. I really feel this needs it though.... @auduchinok Do you know a way that would avoid this duplication while retaining certainty that we haven't mucked with parsing of old code?

@auduchinok
Copy link
Author

auduchinok commented May 30, 2023

implement this by duplicating the entire parser

Now I think of it, duplicating the parser is actually problematic because the token definitions all get duplicated. That would require a change in FsLexYacc. Ugh. I really feel this needs it though.... @auduchinok Do you know a way that would avoid this duplication while retaining certainty that we haven't mucked with parsing of old code?

On the parser side, this change should only require adding/changing recovery rules, like the ones I've been adding recently, not changing normal case ones. It should have no more effect on normal rules than lots of recently added recovery ones.

I think splitting the parser into the old and the new ones can be good in the long term but should probably be done separately, as it's really orthogonal to the indentation recovery I'm proposing. This task may be big enough by itself even without this additional refactoring, as we'll need to make sure all the IDE features work with the new recovery. Let's maybe make it a separate change? 🙂

@kerams
Copy link

kerams commented May 30, 2023

Love the idea of completing the sunsetting of non-light syntax and ML compatibility.

@dsyme
Copy link
Collaborator

dsyme commented May 30, 2023

On the parser side, this change should only require adding/changing recovery rules, like the ones I've been adding recently, not changing normal case ones. It should have no more effect on normal rules than lots of recently added recovery ones.

But don't those rules apply unconditionally? Making it impossible for that parser to parse like it did previously?

@auduchinok
Copy link
Author

But don't those rules apply unconditionally? Making it impossible for that parser to parse like it did previously?

Yes, they should, but we have control over what's inserted by LexFilter, and we can make the new recovery rules specific to the LexFilter changes where needed.

@Tarmil
Copy link

Tarmil commented May 30, 2023

Among the obsolete rules that could be flushed by a parser clean slate, #806 is also a candidate. Infix operator de-indentation was considered a misfeature by everyone in the conversation and it bites people regularly. Don approved adding a warning on it; but while we're making breaking changes to the parser, we could even remove it altogether.

@auduchinok
Copy link
Author

auduchinok commented Jun 8, 2023

I've implemented the new rules for some of the most common cases in dotnet/fsharp#15334, and it improved parsing recovery significantly, fixing many issues with highlighting/folding/lenses during typing, like reported in dotnet/fsharp#7078.

With this change places where code gets deindented act as syncronization points, and it allows producing begin/end-like tokens for the parser, instead of making it parse everything below the unfinished code incorrectly. Since F# syntax is indentation based, using indentation blocks as recovery mechanism seems to be natural and probably the best choice we have. In other words, every place in the language that required additional indentation now allows us to significantly improve things, and every place that didn't may make it more challenging (or impossible) to improve the tooling support.

Here're some examples:

let _ =
  if true then
  () // warning
let _ =
  if true then () else

  () // no warning, so it's possible to write early returns without additional indentation
let _ =
  match () with
  | _ ->

  () // no warning, for a similar reason

These cases deliberately allow code to not have additional indentation, and it makes the usage of the language much better in some scenarios. Fortunately, the most of such cases normally appear inside other expressions/declarations, so the outer indentation block still help us have better recovery with the new rules.

Unfortunately, there're cases where the language design doesn't seem to be consistent, and some cases don't require additional indentation, despite code below often becomes impossible to be parsed (in some expected way or at all):

type T =
member this.Prop = 1 // a warning is produced (an error with the new rules)
type T with
member this.Prop = 1 // no warning

Not having indentation here makes it impossible to properly parse things below:

type T with
member this.Prop = 1

let x = 1 // error, this is considered a part of the type augmentation

Subsequently, it doesn't parse things as expected while typing too, even if you're going to have the proper indentation:

type T with

let x = 1 // errors here and below, until a member with proper indentation is typed above

There're other similar cases:

type T() =
    do
    () // no warning

    member this.Prop = 1 // error, this is parsed as a part of the `do` expression

And it breaks for unfinished code too, making the user experience much worse:

type T() =
    do

    // errors here and below
    // this is unsuccessfully parsed as a part of the `do` expression
    member this.Prop = 1 

With all this considered, I propose that we normalize the indentation requirements where it makes sense:

  • for top level declarations indentation should be required, if the code below can't normally be parsed without it in the most of the cases (like in the type/with and do examples above)
  • the changed places should start producing warnings if the new indentation rules are disabled
  • deliberate undentation is going to still be allowed, because it's an important part of the language that people use
  • other cases can be discussed if found

We can discuss the particular cases where it makes sense to change the behaviour, and I'll try to start normalizing the requirements in dotnet/fsharp#15102 and other PRs.

@auduchinok
Copy link
Author

auduchinok commented Jun 8, 2023

There're also less critical inconsistent cases:

let _ =
  if true then
  () // warning
let _ =
  while true do
  () // no warning

Expressions like while are normally placed into other declarations/expressions, so there're already less cases where they break code below with the new rules enabled. An important exception is top-level expressions in modules, often used in scripts (and Program.fs in console apps). To improve things here we may want to normalize some of these cases too. It's possible and could be good to do, but makes the changes more breaking. They also don't fall into 'breaks too many things below' or 'deliberately designed and used' categories, so there should be more discussion about such cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants