You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A couple of years ago in #165506, @fabiospampinato raised the idea of running TextMate grammars in JS using an Oniguruma to JS regex transpiler (for performance reasons and also potentially to remove the large Oniguruma dependency). CC @alexdima. However, the benefit was hypothetical at the time since there wasn't any regex transpiler written in JS that could actually do this, so a Ruby library was used as a proof point (but that wouldn't have worked since it's written in Ruby, transpiles Onigmo rather than Oniguruma, wasn't designed to support the way regexes are used in TextMate grammars, and wasn't robust enough to cover the long tail of grammars that often include complex regexes that rely on Oniguruma edge cases).
A library now exists (Oniguruma-To-ES) that solves these problems. It's lightweight and has been used for a while by the Shiki library, with support for the vast majority of TM grammars. The issues with the handful of remaining unsupported grammars are well understood, and are mostly the result of bugs in the grammars, bugs in Oniguruma, or use of a few extremely rare features that can be supported in future versions or worked around.
Of course, VS Code wants to be a good OSS citizen and not break any existing grammars. So perhaps certain grammars that offer better performance and are known to offer perfect transpilation could be marked to use JavaScript rather than Oniguruma.
In a basic benchmark of Shiki's JS vs WASM engine (using precompiled versions of the grammars that had been pre-run through Oniguruma-To-ES), the JS engine performed faster in many cases including the following examples (all with identical highlighting results compared to the WASM engine):
Python: 8.6x faster.
Markdown: 3.5x faster.
CSS: 2x faster.
SCSS: 3.4x faster.
Bash: 2.6x faster.
Kotlin: 1.2x faster.
Perl: 1.4x faster.
PHP: 1.4x faster.
Go: 1.3x faster.
Objective-C: 1.3x faster.
These times are based on processing the language samples that Shiki provides; e.g. here's the Markdown sample.
The JS engine with precompiled regexes is not faster than Oniguruma (via WASM) with all grammars, but such cases might be reduced in the future since there are optimization opportunities (this issue includes an example) that aren't yet implemented.
The text was updated successfully, but these errors were encountered:
slevithan
changed the title
Run regexes for TM grammars in native JS using oniguruma-to-es
Run regexes for TM grammars in native JS for perf
Jan 9, 2025
A couple of years ago in #165506, @fabiospampinato raised the idea of running TextMate grammars in JS using an Oniguruma to JS regex transpiler (for performance reasons and also potentially to remove the large Oniguruma dependency). CC @alexdima. However, the benefit was hypothetical at the time since there wasn't any regex transpiler written in JS that could actually do this, so a Ruby library was used as a proof point (but that wouldn't have worked since it's written in Ruby, transpiles Onigmo rather than Oniguruma, wasn't designed to support the way regexes are used in TextMate grammars, and wasn't robust enough to cover the long tail of grammars that often include complex regexes that rely on Oniguruma edge cases).
A library now exists (Oniguruma-To-ES) that solves these problems. It's lightweight and has been used for a while by the Shiki library, with support for the vast majority of TM grammars. The issues with the handful of remaining unsupported grammars are well understood, and are mostly the result of bugs in the grammars, bugs in Oniguruma, or use of a few extremely rare features that can be supported in future versions or worked around.
Of course, VS Code wants to be a good OSS citizen and not break any existing grammars. So perhaps certain grammars that offer better performance and are known to offer perfect transpilation could be marked to use JavaScript rather than Oniguruma.
In a basic benchmark of Shiki's JS vs WASM engine (using precompiled versions of the grammars that had been pre-run through Oniguruma-To-ES), the JS engine performed faster in many cases including the following examples (all with identical highlighting results compared to the WASM engine):
These times are based on processing the language samples that Shiki provides; e.g. here's the Markdown sample.
The JS engine with precompiled regexes is not faster than Oniguruma (via WASM) with all grammars, but such cases might be reduced in the future since there are optimization opportunities (this issue includes an example) that aren't yet implemented.
The text was updated successfully, but these errors were encountered: