Skip to content

Commit

Permalink
Update readme.
Browse files Browse the repository at this point in the history
  • Loading branch information
kaby76 committed Dec 4, 2024
1 parent 50d967d commit 2f05925
Show file tree
Hide file tree
Showing 2 changed files with 217 additions and 37 deletions.
54 changes: 17 additions & 37 deletions sql/mysql/Oracle/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,49 +2,29 @@

## General

For more than a decade, the MySQL GUI development tools team at Oracle has provided the open source
[MySQL Workbench](https://github.com/mysql/mysql-workbench), which uses ANTLR4 for all MySQL code parsing tasks. This requires to translate all changes from the
[MySQL server grammar](https://github.com/mysql/mysql-server/blob/8.0/sql/sql_yacc.yy) to ANTLR4, which is an ongoing effort to always stay up-to-date with the latest and greatest server features (like the Multi Language Extension (MLE) for stored routines). The current grammar supports all MySQL versions starting with 8.0.
This parser grammar is derived from the official Oracle grammar posted here in original/,
which is derived from sources in the MySQL Shell for VS Code extension.
https://github.com/mysql/mysql-shell-plugins/tree/8928ada7d9e37a4075291880738983752b315fee/gui/frontend/src/parsing/mysql

Meanwhile, development focus has been shifted to the [MySQL Shell for VS Code extension](https://marketplace.visualstudio.com/items?itemName=Oracle.mysql-shell-for-vs-code), which is the original source of the grammar you can find here.
This grammar is set to recognize version "8.0.200".

Parser generated from this grammar are very fast (given the high ambiguity of the MySQL language). For details see the [ANTLR4 runtime benchmarks](https://github.com/mike-lischke/antlr4-runtime-benchmarks/tree/main/src/mysql) repository.
## License

Like all of Oracle's open source, this grammar is released under the GPLv2.

## Correct and Flexible Parsing
## Target Agnostic

In order for applications like MySQL Shell for VS Code to parse MySQL code correctly, some conditions must be considered, namely the currently used MySQL **server version** and the active **SQL modes** (e.g. to distinguish between identifiers and double-quoted strings, depending on the ANSI mode setting). There are also some peculiarities to consider:
This grammar is "target agnostic." Unaltered, the .g4 files will not work for
Antlr4ng, Cpp, Go, and Python3. You will need to first run `python transformGrammar.py`
provided in the target-specific directory. The script modifies the .g4 files
for the port.

- [String literal concatenation and Character set introducers (aka underscore char sets or string repertoires)](https://dev.mysql.com/doc/refman/8.0/en/string-literals.html).
- [Keyword after dot](https://dev.mysql.com/doc/refman/8.0/en/keywords.html)
- [Built-in function name parsing](https://dev.mysql.com/doc/refman/8.0/en/function-resolution.html). The IGNORE_SPACE SQL mode is properly handled.
- [Version Comments](https://dev.mysql.com/doc/refman/8.0/en/comments.html), like `CREATE TABLE t1(a INT, KEY (a)) /*!50110 KEY_BLOCK_SIZE=1024 */;`
## Modifying this grammar
This grammar is current hand-written. The plan is to generate the ports directly
from the sources at https://github.com/mysql/mysql-shell-plugins.

The server version and SQL mode can be toggled at runtime, allowing the use of a single parser with different version/mode settings and providing better error messages (like for [a feature that is only valid for a specific version](https://github.com/mysql/mysql-shell-plugins/blob/master/gui/frontend/src/parsing/mysql/MySQLErrorListener.ts#L109)).
## Issues
* The grammar is ambiguous, but generally performs well, except for bitrix_queries_cut.sql, which contains ~3000 ambiguities.

String repertoires require a list of character set identifiers, which must be provided by your implementation. You can get a list of available character sets by running `show character set`.

## Using the Grammar

To provide the full feature set the MySQL grammar needs some support code, which is implemented in base classes for both, the MySQL Parser (named `MySQLBaseRecognizer`) and the MySQL Lexer (named `MySQLBaseLexer`). You can find a TypeScript implementation of both classes in the TypeScript/ folder, which should be easy to port over to different runtime languages.

This folder also contains a demo script that shows how to set up the MySQL lexer and parser and parse some input. It needs the TS runtime antlr4ng (and some additional modules to allow running the demo). For this run the node module installation in the TypeScript/ folder:

```bash
npm i
```

After that you can generate the (TypeScript) parser and lexer files by running:

```bash
npm run generate
```

A new folder is created name `generated`, which contains the new files. Now the demo is ready for execution:

```bash
npm run demo
```

It will run a simple MySQL query and prints its parse tree.
## Performance
<img src="./times.svg">
Loading

0 comments on commit 2f05925

Please sign in to comment.