Is it possible to do bi-directional parsing with Lark? #1396

thomasahle · 2024-03-22T17:40:39Z

thomasahle
Mar 22, 2024

I was reading about bi-directional parsing, where you define your parser/grammar and pretty-printer/templating-language at the same time.

The basic idea is expressed by this example:

Is it possible to do something like that with Lark?

MegaIng · 2024-03-22T19:56:23Z

MegaIng
Mar 22, 2024
Collaborator

Lark has a reconstructor, but you have to still manually build a pretty printer (inserting the correct ignored tokens, to for example match indentations between lines in a C-like language) manually on top of that.

At least based on my reading of the abstract and looking at the examples, I don't think the first linked paper actually does what you want. I am not actually sure what they mean with "bidirectional".

If you want to contribute facilities to do something like this, it might be possible to add them to lark. However, note that there is a pretty fundamental difference in syntactic form between a templating engine and a BNF-based parser. This might make more sense as a different library that uses lark in the background.

0 replies

erezsh · 2024-03-22T20:35:54Z

erezsh
Mar 22, 2024
Maintainer

Lark's reconstructor pretty much does what was described in that reddit post. Because the Lark grammar supports a few simple rules for turning CSTs directly into ASTs (e.g. dropping literals, inlining rules), we can also do the reverse operation. There is some ambiguity involved, but it can be solved relatively efficiently with our Earley parser. However, we don't yet have a good formulation for what derivations to prioritize, so sometime the resulting code can have some unnatural artifacts. It works and is even used by other packages, but mostly for simple languages like JSON.

(as aside: it would be interesting to see if LLMs can smooth those artifacts out).

P.S. thanks for making it a discussion and not an issue ;)

2 replies

nchammas Mar 29, 2024

Happy to start a new discussion if appropriate, but is this problem here similar to that of making an auto-formatter in Lark? i.e. I've created my DSL in Lark, and I also want to create a tool that will automatically format code in that DSL.

I see two examples in the documentation that use the reconstructor, one for JSON and one for Python.

Am I looking at the right place for guidance on how to make an auto-formatter in Lark?

erezsh Mar 29, 2024
Maintainer

The Lark grammar doesn't contain the necessary information for an auto-formatter. There is no way to specify whitespace or newlines.

In theory, it might be possible to write an auto-formatter using the reconstructor, if you add the formatting instructions yourself. But I don't know if it's ever been done, and I don't have a ready solution for how to do it.

So you can try, but it's probably easier to just write a formatter manually, if it's only for one language.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to do bi-directional parsing with Lark? #1396

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is it possible to do bi-directional parsing with Lark? #1396

thomasahle Mar 22, 2024

Replies: 2 comments · 2 replies

MegaIng Mar 22, 2024 Collaborator

erezsh Mar 22, 2024 Maintainer

nchammas Mar 29, 2024

erezsh Mar 29, 2024 Maintainer

thomasahle
Mar 22, 2024

Replies: 2 comments 2 replies

MegaIng
Mar 22, 2024
Collaborator

erezsh
Mar 22, 2024
Maintainer

erezsh Mar 29, 2024
Maintainer