Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression parser #449

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

christiankjaer
Copy link

@christiankjaer christiankjaer commented Sep 25, 2022

Essentially a port of https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/src/Control.Monad.Combinators.Expr.html

  • Added a small microbenchmark
  • Added property based test with arithmetic expressions and friends

@johnynek
Copy link
Collaborator

Thanks for the PR. I think we could add something like this.

I haven't had a chance to review in detail, but I had a couple of comments:

  1. I'd like to see scalacheck test coverage where we generate a random expression, serialize it to string with random whitespace, and then verify that it parses exactly the same.
  2. I'd like some benchmarks to make sure the performance is close to a hand written parser.
  3. I noticed there are a few uses of flatMap. Very few parsers really require that and it definitely hurts performance. Could we try to avoid using flatMap and just use product (or ~, *>, <*) instead?
  4. It seems this is baking in the idea that whitespace is never significant. I wonder if we can make a whitespace Parser0[Unit] as an argument which could potentially be Parser.unit for a case where you don't want to allow any extra whitespace.

@christiankjaer
Copy link
Author

Great. I just published the draft here to check if something like this was wanted. I don't know how many use cats parse for expression parsers.

I will give each of the points a go when I have time. I am not sure about (4), but I will have a think about it.

@christiankjaer christiankjaer marked this pull request as ready for review September 27, 2022 18:53
@christiankjaer
Copy link
Author

christiankjaer commented Sep 27, 2022

I have not addressed (4). I think that route is still possible with this approach. You just don't munch whitespace in the base parser but in the operators instead, or some combination that suits the use case.

Handling of whitespace is not done at all in the expression parser.

@christiankjaer christiankjaer changed the title Ported expression parser Expression parser Sep 27, 2022
@j-mie6
Copy link

j-mie6 commented Mar 6, 2023

Feel like it might be worth highlighting a similar but alternative approach: https://www.cs.tufts.edu/comp/150FP/archive/jamie-willis/parsing-patterns.pdf with a Scala version available in https://dl.acm.org/doi/10.1145/3550198.3550427 (I can get a pdf on request). It is slightly different to the parser-combinators version in that it allows for heterogeneity in the levels of the precedence table. The same mechanism is used by https://github.com/j-mie6/parsley within the parsley.expr package. As it happens, the chain combinators themselves are useful in a library even if precedence is available: I'm not sure if cats-parse already uses them or not.

The precedence scheme encoded by that paper does not have the ternary operators, but does avoid avoid flatMap at all costs: providing efficient implementations of them.

I concur that whitespace parsing is not something that should be handled by the precedence machinery: this is explicitly a convention that should be kept in the "lexer" (see section 3 of those papers). Anecdotally, we use these design patterns at Imperial, for our compilers project and it works wonders compared to when we didn't have them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants