Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run regexes for TM grammars in native JS for perf #237537

Open
slevithan opened this issue Jan 9, 2025 · 0 comments
Open

Run regexes for TM grammars in native JS for perf #237537

slevithan opened this issue Jan 9, 2025 · 0 comments
Assignees

Comments

@slevithan
Copy link

slevithan commented Jan 9, 2025

A couple of years ago in #165506, @fabiospampinato raised the idea of running TextMate grammars in JS using an Oniguruma to JS regex transpiler (for performance reasons and also potentially to remove the large Oniguruma dependency). CC @alexdima. However, the benefit was hypothetical at the time since there wasn't any regex transpiler written in JS that could actually do this, so a Ruby library was used as a proof point (but that wouldn't have worked since it's written in Ruby, transpiles Onigmo rather than Oniguruma, wasn't designed to support the way regexes are used in TextMate grammars, and wasn't robust enough to cover the long tail of grammars that often include complex regexes that rely on Oniguruma edge cases).

A library now exists (Oniguruma-To-ES) that solves these problems. It's lightweight and has been used for a while by the Shiki library, with support for the vast majority of TM grammars. The issues with the handful of remaining unsupported grammars are well understood, and are mostly the result of bugs in the grammars, bugs in Oniguruma, or use of a few extremely rare features that can be supported in future versions or worked around.

Of course, VS Code wants to be a good OSS citizen and not break any existing grammars. So perhaps certain grammars that offer better performance and are known to offer perfect transpilation could be marked to use JavaScript rather than Oniguruma.

In a basic benchmark of Shiki's JS vs WASM engine (using precompiled versions of the grammars that had been pre-run through Oniguruma-To-ES), the JS engine performed faster in many cases including the following examples (all with identical highlighting results compared to the WASM engine):

  • Python: 8.6x faster.
  • Markdown: 3.5x faster.
  • CSS: 2x faster.
  • SCSS: 3.4x faster.
  • Bash: 2.6x faster.
  • Kotlin: 1.2x faster.
  • Perl: 1.4x faster.
  • PHP: 1.4x faster.
  • Go: 1.3x faster.
  • Objective-C: 1.3x faster.

These times are based on processing the language samples that Shiki provides; e.g. here's the Markdown sample.

The JS engine with precompiled regexes is not faster than Oniguruma (via WASM) with all grammars, but such cases might be reduced in the future since there are optimization opportunities (this issue includes an example) that aren't yet implemented.

@slevithan slevithan changed the title Run regexes for TM grammars in native JS using oniguruma-to-es Run regexes for TM grammars in native JS for perf Jan 9, 2025
@sandy081 sandy081 assigned alexdima and hediet and unassigned sandy081 Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants