Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out AST and Parser libraries from GHC #56

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

Ericson2314
Copy link
Contributor

@Ericson2314 Ericson2314 commented Aug 28, 2023

Problem

The community lacks AST and Parser libraries for Haskell that are both self-contained and up-to-date. Experience has shown that there is only way one way to meet each criterion:

  • Be used by GHC, so the library cannot fall behind new language development
  • Be separate from GHC, so the library is forced to be self-contained

However, no library has so far done both, nor met meet both criteria.

Solution

The purpose of this proposal is to make that library finally exist. The Haskell Foundation will finance the completion of the existing "Trees that grow" project, decoupling GHC's AST and parser from the rest of the compiler so they can be moved to separate libraries. Those libaries will be "normal" haskell libraries, without any weird dependencies or build process, and published on Hackage. Those libraries will be used by GHC, ensuring they are maintained.

Rendered

@silky
Copy link
Contributor

silky commented Aug 30, 2023

A (failed 🥲) attempt to suggest that perhaps part of the problem of depending on ghc itself is that it's a massive library.

@tomjaguarpaw
Copy link
Contributor

But it's built in, isn't it? You can't not have it on your Haskell system.

@googleson78
Copy link

Is that all that's needed?

Additionally, the library must be "reinstallable" (if I'm interpreting the term correctly), i.e. it should not be tied to the ghc you have installed, much like ghc-lib-parser itself is.

With the current proposal, a GHC author would have to modify an external library in order to change the parser or parse tree. Then rebuilding GHC is likely to be considerably slower than it is today, where one can just rebuild a module or two (if datatypes don't change) and then link and then test.

I'm not too intimately familiar with Hadrian, and I'm not saying this is a trivial or easy change to it, but it should definitely be possible to express dependencies at the cross-package module level, i.e. if only Foo from package A depends on Bar from package B, and if Bar changes, you don't have to rebuild all the modules from A, and instead you only have to rebuild Foo.
I'm making the "it should definitely be possible" claim because, afair rules_haskell is already capable of doing this.

@shayne-fletcher
Copy link

Is that all that's needed?

Additionally, the library must be "reinstallable" (if I'm interpreting the term correctly), i.e. it should not be tied to the ghc you have installed, much like ghc-lib-parser itself is.

this is it, the key motivation for ghc-lib-parser.

from https://github.com/digital-asset/ghc-lib:
"The GHC API allows you to use the GHC compiler as a library, so you can parse, analyze and compile Haskell code. The GHC API comes preinstalled with GHC, and is tied to that GHC version - if you are using GHC 8.6.3, you get version 8.6.3 of the API, and can't change it. The ghc-lib project solves that problem, letting you mix and match versions of the GHC compiler and GHC API."

e.g. the most recent ghc-lib-parser consists of ghc-9.6.1 source and so HLint can be built with compiler versions in the range [9.2.2, 9.6.1].

@michaelpj
Copy link

Here's a comment I wrote on the GHC issue tracker that covers this a bit: https://gitlab.haskell.org/ghc/ghc/-/issues/14409#note_506489

Excerpting:

In particular, since the GHC parser is backwards compatible (mostly), it's generally possible to get away with just using the latest ghc-lib-parser, thus avoiding lots of CPP. To be clear, this means that you can build a library with GHC 9.2 and use the GHC 9.6 parser and use it to parse Haskell source of all versions up to 9.6. Which is pretty great!

Let me elaborate on that a bit. Suppose you want to use ghc to parse some Haskell source. You also want to support multiple versions of GHC. That forces you to support multiple versions of ghc, which means: lots of CPP. This is not normally a problem when depending on packages - you just say "this only works with the latest major version of foo" and you're done. But with ghc you are forced to use the old versions of ghc if you want to support older versions of GHC. So: CPP everywhere.

ghc-lib-parser (and a hypothetical reinstallable GHC parser package) take us back to the normal world. I can just say that my package depends on ghc-parser >= 9.6 and not have to worry about what version of GHC my user is using.

Of course, this is only true because the GHC parser tends to be backwards-compatible, so you can get away with using the latest version. If you want to rename, typecheck, etc. a program, then you really do want to mimic the GHC that the user has so you really do want an exactly matching ghc. This is the situation that HLS is in. So this work won't directly benefit HLS (except insofar as it makes it less painful to write tools that HLS might then use).

However, I will say that I don't know of any other tool apart from HLS that relies on anything past parsing. I think there's a simple reason for that: parsing can be done (mostly) standalone, whereas to go much further you actually need a proper GHC session with a package database and so on. Getting that all set up is pretty non-trivial and is the whole raison d'etre of hie-bios. So I presume that everyone else just takes a look and doesn't even try.


I think it's worth emphasizing that basically all the "big" tools for working with Haskell source are now using ghc-lib-parser, often migrating to it from ghc. I think that's a pretty clear signal of desirability that:

  • People are willing to maintain the (I say this with great love) awful, fragile hack that is ghc-lib-parser
  • People are keen to migrate to ghc-lib-parser

@michaelpj
Copy link

Another thing: it seems to me that the current situation with ghc-lib-parser is somewhat dangerous. There are two problems:

  1. It requires stitching together various bits of GHC sources. This is fragile and there are 0 guarantees from GHC that it won't break. I have seen several tickets on the GHC issue tracker about changes that broke ghc-lib-parser, which the GHC devs are understandably oblivious towards.
  2. It's reliant on Shayne doing the (I imagine) fairly unpleasant work of maintaining it. (sorry Shayne I'm putting words in your mouth here, please do correct me).

And since lots of tools now rely on ghc-lib-parser, if it stops being updated for some reason... there just won't be any Haskell tools for new GHC versions for a while.

In contrast, a GHC parser package maintained by the GHC team is going to be robustly tested, maintained by the right people, and probably just much less work overall.

@Ericson2314
Copy link
Contributor Author

I was assuming that the new packages would be part of the GHC repo, and would take no longer to build than today.

This is the intent of the proposal. There should be no major downsides; no new submodules to bump, and still loading everything in a single ghci. I've added some text to elaborate on this and other things that must not happen.

E.g. there are some NoGhcTc calls in the allegedly-ghc-independent tree.

Eliminating all the dependencies of Language.Haskell.Syntax has proved troublesome. FastString is a major example; I'm not sure if there are others. (E.g. there are some NoGhcTc calls in the allegedly-ghc-independent tree.) Working out what to do about these is a blank cheque; I do not know how long it would take. It's not just a routine matter.

NoGhcTc is indeed something that shouldn't be here, but not something we need to fix, because it doesn't impose imports on the rest of GHC. If we narrowly focus on what is causing violating imports, and not what ought to go where in an ideal world, I think this becomes much more tractable.

I indeed don't' want the HF to write a blank check.

  • Does the parser package you envisage include all source-location info? Haddock? It has to, if GHC is to use it. But that means populating those extension fields in a parser-specific way. Maybe that's ok, but it'd be good to lay it out.

I think the answer is "yes" --- we definitely want a parser we can resuse. That said, I do not know the Parser parts as well as I know the AST parts.


Specifically, what is problematic about depending on ghc as a whole in order to get the parser?

I don't think I am the right person to answer this; I have song strong biases towards modularity but this gets myself and the GHC devs into "we agree to disagree" territory. I hope everyone else in this thread will be able to find a resolution here without me.

I will say this: there are a number of build system and other "dev ops" issues with ghc today. The "reinstallable GHC" project is supposed to solve them all, eventually. The ghc-lib library also solved them right away in an (everyone agrees) less than ideal manner.

The ghc-lib developers could have stopped there, but then they also went and made ghc-lib-parser. And I think ghc-lib-parser is much more widely used than ghc-lib. I think understanding why ghc-lib-parser was made in addition, and why people prefer to use it, will be a key part of asking these "for whom" and "why" questions.

@goldfirere
Copy link
Contributor

Thanks for the responses above. I now better understand why we want this: we want a Haskell program built with (say) GHC 9.2 to be able to parse Haskell files that use syntax from (say) GHC 9.6. Good. I'm on board with that bit of motivation, otherwise known as making the library reinstallable.

I also understand that one problem with the existing ghc-lib-parser is that there's no guarantee it won't break. What about a proposal that addresses this, by (say) adding some CI to GHC that tries to build ghc-lib-parser. Would that be sufficient? I don't know whether the GHC devs would like that, but would that satisfy the goals of this proposal? I'm just trying to understand the minimal action we could take to satisfy the need here.

Regarding downsides: I remain unconvinced about build times. My read of this proposal is that the new parser would live in the same repo as GHC (not in a submodule) but would be its own Haskell package, with its own cabal file. My best understanding of our build tools suggests that changing a file in one cabalized package requires a rebuild of the package, which would then require rebuilds of other packages that depend on the first one. Maybe these dependencies are at the module level, not the package level? But I'd need someone to assert that pretty clearly. (Note: this is not at all about repo organization or anything involving git.)

@Ericson2314
Copy link
Contributor Author

Ericson2314 commented Aug 31, 2023

Thanks for the responses above. I now better understand why we want this

Yay!

What about a proposal that addresses this, by (say) adding some CI to GHC that tries to build ghc-lib-parser. Would that be sufficient? I don't know whether the GHC devs would like that, but would that satisfy the goals of this proposal? I'm just trying to understand the minimal action we could take to satisfy the need here.

I am not sure that is actually less work. But happy to discuss the next bit first before returning to this.

(IMO adding new CI steps (like e.g. the linting ones ;)) often annoys developers when there is this extra "gotcha" step they didn't notice during local development. But just...refactoring the code so the extraction step is superfluous will get us to a new local maxima cleanly with no extra gotcha steps.)

Regarding downsides: I remain unconvinced about build times. My read of this proposal is that the new parser would live in the same repo as GHC (not in a submodule) but would be its own Haskell package, with its own cabal file. My best understanding of our build tools suggests that changing a file in one cabalized package requires a rebuild of the package, which would then require rebuilds of other packages that depend on the first one. Maybe these dependencies are at the module level, not the package level? But I'd need someone to assert that pretty clearly. (Note: this is not at all about repo organization or anything involving git.)

With "vanilla" Cabal and cabal-install, what you say is true. These things invoke ghc --make to build the entire library in one go, and when an upstream library is changed, in general the downstream library needs to be rebuilt entirely.

Hadrian (and GHC's old make build system) are different however, they invoke GHC in one-shot mode, so there is one process per Haskell module. They do not force all downstream packages to be rebuilt, at least last I heard.

You can try this out by modifying a file of ghc with --freeze1, and then seeing that not all ghc-bin modules need to be rebuilt. The same level of not rebuilding ghc-bin when ghc changes we'd want for not rebuilding ghc (and ghc-bin) when the new parser and ast libraries changes.

The ghc vs ghc-bin situation hopefully means we are not breaking any new ground here. :)

@michaelpj
Copy link

I'm on board with that bit of motivation, otherwise known as making the library reinstallable.

Another point here - we've talked a bit about reinstallable ghc. That's definitely also nice to have, but it's worth noting that it's much more complicated, because we want it to be a functional ghc, and that can be tricky. On the other hand, a hypothetical ghc-parser package would likely be a very boring standard Haskell package and not lead to any of the above head-scratching.

Regarding downsides: I remain unconvinced about build times.

If we keep pushing on the modularity we can get to a better place with plain cabal build times also. Conceptually, most of GHC shouldn't depend on the parser, only the driver that puts it all together. So if you have the driver as a separate component that depends on the parser, typechecker etc, then changing the parser should mean rebuilding only the driver component.

What about a proposal that addresses this, by (say) adding some CI to GHC that tries to build ghc-lib-parser. Would that be sufficient?

I think it would help, but I do think a significant part of the problem is that it's not being maintained by the GHC devs. If the GHC devs want to adopt and maintain ghc-lib-parser... I guess that would be okay, but it really feels like taking the painful route.

@goldfirere
Copy link
Contributor

Thanks again. I'm becoming more convinced. For me the story runs like this:

  • There is strong community demand for ghc-lib-parser.
  • ghc-lib-parser is painful to maintain and fragile.
  • This proposal describes how we can get to an alternative parser library that is maintained for free (as always, "for free" means "paid for by someone else", in this case the GHC team, in the course of their normal work)
  • The proposal also contains a roadmap for getting there (which I have not thought deeply about)

Is that an accurate telling of the story? This story need not be what inspires you (for any definition of "you"), but if it's an accurate story, then it might be what inspires me. (Specifically, some people have a great desire for more modularity. That motivation speaks to them. But, on its own, it does not speak to me, other than as an aesthetic "nice to have".)

@shayne-fletcher
Copy link

quick comment on ghc-lib vs ghc-lib-parser. ghc-lib re-exports ghc-lib-parser and adds in the rest of the GHC API such that you can produce GHC core from haskell source. ghc-lib-parser (~400 modules) was split from ghc-lib (~400 modules) in an effort to achieve install times for HLint users as reasonable as they could be.

@Ericson2314
Copy link
Contributor Author

@goldfirere That's exactly right!

For me personally modularity is very exciting, but I indeed there is not yet enough evidence that other people care enough for that to motivate an HF proposal on alone --- I knew I couldn't just write a GHC modularity in its entirely HF proposal and expect it to go through.

I turned to ghc-lib-parser because there was clear community interest in that library. Perhaps people prefer ghc-lib-parser to ghc-lib because it is more modular, perhaps people prefer it simply because fewer modules => faster builds. I don't know the answer, but per your story that answer doesn't really matter. It's

  1. A popular library
  2. Used for tasks the community also cares about (more tools!)
  3. Currently very tedious for @shayne-fletcher and others to maintain

And the official goal the proposal is (3), while also making the library hopefully also better than it was before, helping out (2). Any modularity benefits are just icing on the cake we need not worry about if folks don't really care for icing :).

@Ericson2314
Copy link
Contributor Author

Ericson2314 commented Sep 1, 2023

(Once could state that I am cynically laundering modularity work under something more popular. I prefer to state that I am trying to demonstrate modularity being useful to help solve existing problems, problems that (at least as I understand it) can't very well be solved any other way. Modularity is part of the means but not the end of the proposal.)

(edit the rendered link was wrong, so changes since the initial version did not appear. it is fixed now.)

@tomjaguarpaw
Copy link
Contributor

Once could state that I am cynically laundering modularity work under something more popular

I think you're scoping out work that has benefits across a number of different axes!

Ericson2314 and others added 2 commits September 2, 2023 14:10
Thanks!

Co-authored-by: David Thrane Christiansen <[email protected]>
@Ericson2314
Copy link
Contributor Author

Ericson2314 commented Sep 2, 2023

I certainly hope so! But some of the more abstract ones are hard to argue in an ironclad objective way. I'm fine with leaving them out of it and hoping the main more concrete enough ones are enough to carry the day.

If this goes as well as I hope, it will provide useful a shared experience to ground discussing those other things, and other modularity projects that relate to them. But until then those conversations are rather unmoored and I'm OK waiting.

@mrkkrp
Copy link

mrkkrp commented Sep 19, 2023

I was asked to comment here as a maintainer of Ormolu, a tool that depends on ghc-lib-parser. We do not experience many problems with ghc-lib-parser build times ourselves as developers since once there is a new version it gets cached in our CI. However, there have been some complaints about long build times of ghc-lib-parser from our users and some attempts to avoid building it if possible:

Which is telling.

I'm not very familiar with the way GHC is developed, so I won't express any strong opinions. On the whole, the idea of having the parser as a "normal" Haskell library which nevertheless would continue to live in the GHC source tree seems to be a good one in the sense that it would reduce build times for end users without making GHC development harder.

@brandonchinn178
Copy link

As a maintainer of Fourmolu, I don't have much to contribute beyond what @mrkkrp said. As Fourmolu is a fork of Ormolu, we mostly inherit ghc-lib-parser from Ormolu and don't interact with that dependency a ton.

But as a maintainer of tasty-autocollect, I can't use ghc-lib-parser because it's a compiler plugin, so I have to use ghc directly. So I do use CPP to provide a common interface that's implemented by each GHC version. Even if ghc-lib/ghc-lib-parser could be reused, I'd probably avoid it anyway because 1) long build time and 2) my usecase is small enough that manually writing shims isn't too much work. But if there were a library broken out that can be used for plugins + provides a compatible interface for multiple versions of GHC + is prebuilt/doesnt have a huge build time, it would be useful for tasty-autocollect

@silky
Copy link
Contributor

silky commented Oct 11, 2023

I happened to glance at shakespeare and noted this comment:

Note there is no dependency on haskell-src-extras. Instead Shakespeare believes logic should stay out of templates
and has its own minimal Haskell parser.

Maybe @snoyberg you would like to comment or more likely point us to someone who is a potential maintainer of shakespeare for their opinion on if this work would be of value?

@mihaimaruseac
Copy link

As a maintainer of hlint, I also think having the parser as a separate library is worthwhile. Note though that hlint migrated from haskell-src-extras recently because this library was lagging behind GHC development, so I wouldn't want the same to happen with a separated ghc-lib-parser (though I'm optimistic that the fact that the library would still be part of GHC tree means it won't lag behind).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.