Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: Complete rewrite to LALR #202

Merged
merged 31 commits into from
Dec 27, 2024
Merged

parser: Complete rewrite to LALR #202

merged 31 commits into from
Dec 27, 2024

Conversation

elliotchance
Copy link
Owner

@elliotchance elliotchance commented Dec 27, 2024

This replaces the existing Earley parser (which is O(n^3)) with a LALR parser using Yacc which is an ideal O(n). Even with very short SQL statements, the existing parser was really slow, so I had to build a query cache as bandaid, but that has also been removed now.

This refactoring was made possible by adapting yacc from a Go implementation here: https://github.com/elliotchance/vyac. However, in keeping with the promise of this repo being completely written in V, the source has been copied to this repo.

Other notable and breaking changes:

  1. Not sure how this worked before, but the query may not specify a catalog in identity chains (for example, catalog.schema.table). The catalog must be set using SET CATALOG.
  2. Syntax error messages will be slightly different, but should be a little more helpful.
  3. There are some ambiguities with the SQL grammar, such as trying to decode what x IS NOT TRUE means or differentiating between COUNT(expr) vs COUNT(*) due to lookahead limitations. Some special tokens for combinations of operators and keywords have had to be added for known edge cases, but there are many remaining conflicts. I suspect these conflicts don't matter as ambiguous paths should still yield valid results, so these warnings have to be ignored for now.
  4. Fixes a very minor bug where string literals in VALUES might be treated as VARCHAR instead of CHARACTER in some cases.
  5. Renamed "std_" files with their position number in the standard. This helps for grouping similar sections and makes lookups easier.

@elliotchance elliotchance changed the title parser: Complete rewrite of parser parser: Complete rewrite to LALR Dec 27, 2024
@elliotchance elliotchance merged commit c1def77 into main Dec 27, 2024
8 checks passed
@elliotchance elliotchance deleted the yacc branch December 27, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant