Syntax highlighter #2464

Andriamanitra · 2022-06-18T05:11:05Z

Andriamanitra
Jun 18, 2022

I want to start by saying that I love how simple it is to create new syntaxes for micro. The simplicity has enabled me to create a handful of different syntaxes that have been very useful to me, all with just some regular expressions. However, as I have worked more with the syntaxes their limitations have become apparent:

Problem 1: Buggy ^ and $

Handling of ^ and $ (start of the line and end of the line) inside regular expressions is buggy in nearly all cases (for example #2458, #2463). Using them in the middle of a regular expression or inside a character group is the cause of various issues.

Problem 2: Lack of look-behind and look-ahead

The lack of support for look-behind or look-ahead makes some syntaxes very hard to define. In languages that allow wide variety of characters in identifiers (like most Lisp dialects do) I haven't found a good way to define tokens so they don't get highlighted if they're inside an identifier. For example I might want to highlight print when it is on its own but not in my-print-func. Using word boundaries (\\b) doesn't work because - is not a word character.

Problem 3: No way to switch between syntaxes within a file

There is no good way to switch from one syntax to another within a single file. For example in Python it would be nice to have syntax highlighting inside f-strings. Using the include feature only gets you so far, because you can't include python syntax from the python syntax file itself. I suspect the problem here is that the highlighter tries to generate the entire (infinitely recursive) syntax in advance rather than as required.

Problem 4: Syntaxes need to be defined specifically for micro

The way micro defines syntaxes is not compatible with the more popular editors, so syntaxes often need to be manually translated. I am aware that the situation with language syntax formats is currently a bit of a mess (there's TextMate, VSCode, vim, emacs, monaco-editor... all with their own formats), but it would be nice to share a format with one of the more popular editors. In my opinion YAML is not the best file format for syntax definitions (or for anything else, to be honest).

Problem 5: No way to dynamically create tokens

To implement syntax highlighting for something like HEREDOCs you would need to be able to recognize a token (like "<<-EOF") and create a new token ("EOF") based on it. Dynamic tokens would also make it possible to for example highlight variable and function names only if they have been defined, which might be useful in some cases.

Solutions?

The issues with ^ and $ could be fixed without changing how the highlighting system works. Look-behinds and look-aheads could also be implemented in the current system, but they introduce a lot of complexity and cause performance issues when used poorly.

For problem 3, it is likely possible to work around all the cases that realistically matter using include and a few separate syntax files, but I think a more robust implementation would be beneficial.

The 4th problem is not a big issue, it just takes some extra effort from the community to keep all the syntaxes up to date and working properly.

The last problem is trickier to solve without making big changes to how the highlighter and defining syntaxes works. For a simple case such as HEREDOCs you could abuse backreferences and put everything inside a single token. I don't think backreferences exist in the current highlighter implementation (correct me if I'm wrong). For more complicated cases a completely different approach would be necessary.

Summary / Questions

The syntax highlighter has some issues. I would like to help make it better. What are the future plans for syntax highlighting? Is the current implementation here to stay, or is it likely it will be replaced in the foreseeable future? Would there be any interest in a coordinated effort to improve the situation? If so, do you think that effort would be better spent working on current highlighter and syntaxes, or making something from scratch?

zyedidia · 2022-06-18T14:29:31Z

zyedidia
Jun 18, 2022
Maintainer

Yes the current syntax highlighting engine has a multitude of problems. I think the solution is to switch to a system that uses PEGs (parsing expression grammars) rather than regular expressions and use incremental PEG parsing for fast rehighlighting. PEGs are roughly as powerful as context-free grammars so it will be possible/easier to describe complex grammars without hacks like regions and lookahead/lookbehind. I have a library that uses this approach at https://github.com/zyedidia/flare, but still have to integrate this with micro.

1 reply

Andriamanitra Jun 19, 2022
Author

Great to hear you already have plans and even an implementation waiting to be integrated! I agree PEGs are a better, more flexible solution. They should fix all of these problems except the one about having to define syntaxes specifically for micro. Fortunately some programming languages (for example Python) already have their grammar defined by PEG, which should be easy enough to convert into a format flare understands.

ManuLinares · 2022-08-31T05:01:15Z

ManuLinares
Aug 31, 2022

I have a question, how can I change the detected "Syntax file" being used in a buffer?

I have a file opened, and I want to change the "file type".

2 replies

zyedidia Aug 31, 2022
Maintainer

There is a filetype option you can change. For example, Ctrl+e and enter set filetype c. You can set the filetype based on the extension in your settings.json, I think > help options has an example of this.

ManuLinares Aug 31, 2022

There is a filetype option you can change. For example, Ctrl+e and enter set filetype c. You can set the filetype based on the extension in your settings.json, I think > help options has an example of this.

Thanks!

I saw that option but wasn't sure how to change it, I typed 'filetype ' and nothing (thanks again).

the 'set' is mentioned first in help line 26 , but it isn't clear that the 'set' could be used by other stuff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax highlighter #2464

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Syntax highlighter #2464

Andriamanitra Jun 18, 2022

Problem 1: Buggy ^ and $

Problem 2: Lack of look-behind and look-ahead

Problem 3: No way to switch between syntaxes within a file

Problem 4: Syntaxes need to be defined specifically for micro

Problem 5: No way to dynamically create tokens

Solutions?

Summary / Questions

Replies: 2 comments · 3 replies

zyedidia Jun 18, 2022 Maintainer

Andriamanitra Jun 19, 2022 Author

ManuLinares Aug 31, 2022

zyedidia Aug 31, 2022 Maintainer

ManuLinares Aug 31, 2022

Andriamanitra
Jun 18, 2022

Replies: 2 comments 3 replies

zyedidia
Jun 18, 2022
Maintainer

Andriamanitra Jun 19, 2022
Author

ManuLinares
Aug 31, 2022

zyedidia Aug 31, 2022
Maintainer