Syntax highlighter #2464
Replies: 2 comments 3 replies
-
Yes the current syntax highlighting engine has a multitude of problems. I think the solution is to switch to a system that uses PEGs (parsing expression grammars) rather than regular expressions and use incremental PEG parsing for fast rehighlighting. PEGs are roughly as powerful as context-free grammars so it will be possible/easier to describe complex grammars without hacks like regions and lookahead/lookbehind. I have a library that uses this approach at https://github.com/zyedidia/flare, but still have to integrate this with micro. |
Beta Was this translation helpful? Give feedback.
-
I have a question, how can I change the detected "Syntax file" being used in a buffer? I have a file opened, and I want to change the "file type". |
Beta Was this translation helpful? Give feedback.
-
I want to start by saying that I love how simple it is to create new syntaxes for micro. The simplicity has enabled me to create a handful of different syntaxes that have been very useful to me, all with just some regular expressions. However, as I have worked more with the syntaxes their limitations have become apparent:
Problem 1: Buggy ^ and $
Handling of
^
and$
(start of the line and end of the line) inside regular expressions is buggy in nearly all cases (for example #2458, #2463). Using them in the middle of a regular expression or inside a character group is the cause of various issues.Problem 2: Lack of look-behind and look-ahead
The lack of support for look-behind or look-ahead makes some syntaxes very hard to define. In languages that allow wide variety of characters in identifiers (like most Lisp dialects do) I haven't found a good way to define tokens so they don't get highlighted if they're inside an identifier. For example I might want to highlight
print
when it is on its own but not inmy-print-func
. Using word boundaries (\\b
) doesn't work because-
is not a word character.Problem 3: No way to switch between syntaxes within a file
There is no good way to switch from one syntax to another within a single file. For example in Python it would be nice to have syntax highlighting inside f-strings. Using the
include
feature only gets you so far, because you can't includepython
syntax from thepython
syntax file itself. I suspect the problem here is that the highlighter tries to generate the entire (infinitely recursive) syntax in advance rather than as required.Problem 4: Syntaxes need to be defined specifically for micro
The way micro defines syntaxes is not compatible with the more popular editors, so syntaxes often need to be manually translated. I am aware that the situation with language syntax formats is currently a bit of a mess (there's TextMate, VSCode, vim, emacs, monaco-editor... all with their own formats), but it would be nice to share a format with one of the more popular editors. In my opinion YAML is not the best file format for syntax definitions (or for anything else, to be honest).
Problem 5: No way to dynamically create tokens
To implement syntax highlighting for something like HEREDOCs you would need to be able to recognize a token (like "<<-EOF") and create a new token ("EOF") based on it. Dynamic tokens would also make it possible to for example highlight variable and function names only if they have been defined, which might be useful in some cases.
Solutions?
The issues with
^
and$
could be fixed without changing how the highlighting system works. Look-behinds and look-aheads could also be implemented in the current system, but they introduce a lot of complexity and cause performance issues when used poorly.For problem 3, it is likely possible to work around all the cases that realistically matter using
include
and a few separate syntax files, but I think a more robust implementation would be beneficial.The 4th problem is not a big issue, it just takes some extra effort from the community to keep all the syntaxes up to date and working properly.
The last problem is trickier to solve without making big changes to how the highlighter and defining syntaxes works. For a simple case such as HEREDOCs you could abuse backreferences and put everything inside a single token. I don't think backreferences exist in the current highlighter implementation (correct me if I'm wrong). For more complicated cases a completely different approach would be necessary.
Summary / Questions
The syntax highlighter has some issues. I would like to help make it better. What are the future plans for syntax highlighting? Is the current implementation here to stay, or is it likely it will be replaced in the foreseeable future? Would there be any interest in a coordinated effort to improve the situation? If so, do you think that effort would be better spent working on current highlighter and syntaxes, or making something from scratch?
Beta Was this translation helpful? Give feedback.
All reactions