-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to add textDocument/ast #1998
Comments
Note that one of the problems tree-sitter aims to address is to allow efficient updates of the syntax tree as changes are made to a document. neovim for example integrates both, tree-sitter and LSP and allows to use both for highlighting and with some languages it is very visible that the LSP highlighting lags behind if you type somewhat quickly I'd say that for editors that already integrate something like tree-sitter (e.g. neovim, helix, and emacs via plugin) this is not very interesting, unless it would also include type information which might open the door for more advanced structured-search-and-replace capabilities. |
That's not what I said in the proposal, this proposal isn't about parsing the full document. Languages that have an LSP implementation would implement this capability by leveraging their already existing and correct parsers to parse selections of code, rather than the entire document, making it efficient. Most languages are written in C, so the parsing step is already very fast. It would also be relatively easy for most languages to add support for parsing a selection of code, and many homoiconic languages even expose it as a function.
That's because your example would inherently involve parsing the entire document as you said, since there isn't an LSP capability like this for languages to implement.
This aims to resolve many of treesitters flaws. I've used and written parsers for treesitter enough times to notice that a lot of treesitter parsers either:
I've also tried to use parsers in treesitter to leverage them to build interesting programming tools, and was very exciting to use it, only to find that they're often imcomplete, incorrect, or buggy and that it wouldn't work. This proposal comes from my own experiences. I'll add a fourth point that parsers are hard to write, and hard to write correctly. By including this kind of capability, we will know that any error in the parsing is an error in the language itself, which makes it very likely that the parsing works properly. It really would be a shame for this proposal to just get dismissed and rejected, because people might just skim this thread without understanding the what and why of this being proposed and because tree-sitter "already exists". If that does happen, I would strongly suspect that this issue would re-emerge in a new ticket years or decades later again. |
Citation needed. (Hint: LanguageServer.jl.)
And are usually written (or generated) with a specific purpose in mind. Tree-sitter has one such purpose: incremental parsing and efficient evaluation of queries against the parsed concrete (not abstract!) syntax tree, mostly (but not exclusively) for the purpose of syntax highlighting. Latency is king here. This is a very different requirement than generating a static AST for the purpose of cross-project "code intelligence".
I would expect that this would happen because you make blanket statements without admitting to the complexity of the problem space. TL;DR: just because a tool is bad for your purpose doesn't mean it's bad for its own stated purpose. And behind this proposal lurks the false dichotomy that you can (or should) only use one parser technology for all purposes rather than the right tool for the right job.
Oh, and the hidden assumption that language servers always parse everything perfectly. That's some powerful wishful thinking there, that not at all aligns with my experience. (Also, a number of language servers actually use tree-sitter under the hood for parsing. How does that fit into your argument?) |
(All this is to say that your proposal for a new request would probably get a better reception if you didn't wrap it in an invective against other projects -- which isn't that even relevant for the proposal itself.) |
Highlighting is a case where incremental parsing can be used and where you can see today that the LSP version is often slower. And parsing selections of the code isn't always easy - but that depends a lot on the language's grammar. Reading that article should also make it somewhat apparent that any proposal in regards to exposing a AST would have to specify how to deal with documents that contain syntax errors and can't be parsed completely. Code while being edited is often incomplete.
Many compilers aim to be self-hosted. And language compiler != language-server
This is partly why I mentioned that for editors like neovim the proposal would be a lot more interesting if it included type information. Tree-sitter can't provide that on its own so instead of being an alternative with different trade-offs, it would enable new use-cases. |
This proposal is designed to address the limitations of treesitter, that's why I mentioned it's limitations. I don't feel like it would be productive to continue arguing with either of you, I don't feel like either of you are acting in good faith for some odd reason. I'll let other people chime in and decide what they want to do with this ticket. |
Funny, that's exactly the impression I got from your comments. I don't see what the limitations of tree-sitter have to do with the language server protocol as these address different use cases. |
This is admittedly a pretty bizarre exchange for me, usually this doesn't happen when I write or comment on tickets. I don't really get it, but whatever. I won't comment on the rest of this thread, I'll let others chime in and decide what they want to do with the ticket. I just don't want the actual idea of the ticket to get derailed by this since it's irrelevant to the actual idea itself. I'll just add one other thing that language servers would have to implement besides parsing regions, which is incremental parsing. Though I think incremental parsing could be optional because the capability would still be really useful with regions alone for various tasks. |
I don't have a strong opinion on standardizing an AST request, but for reference, clangd supports one as an extension. It is not incremental or anything fancy like that; it requires that the document be already opened with It was initially written to help debug the server by visualizing AST fragments in the client in a simple tree view, but it has since found use by some users wishing to extend the capabilities of the language server who find it easier to patch a client (e.g. because the client has an extensible plugin system) than to patch the server. |
I'm having trouble seeing how a client could make use of this hypothetical generic AST. You give "syntax highlighting", "code folding", and "auto indentation" as examples of algorithms that could be implemented using it, but each of these is already an operation in the core LSP---and each involves a lot of language-specific logic/policy/opinions on top of the raw AST. You also claim it can be used for "any sort of static analysis of the syntax of the code", but I'm not convinced: even the most trivial of static analyses (formatting, organize imports) is highly language specific. Abstracting away from the AST is why language servers exist. This proposal is in diametric opposition. |
Hi folks, I don't have any particular opinion about whether this proposal would be useful, but I'd like to share our experience in the coq-lsp server, where we have implemented a similar call. In particular, most users of the More advanced users don't need the AST, but often an abstracted version of it. For example, CoqPilot wants the list of theorems and holes on them, etc... So indeed, I am not sure any of our use cases are generalizable enough to warrant an LSP extension, but maybe a "split document into logical ranges" call could be considered, but IMHO only weakly. Hope this helps, cheers. |
"Split document into logical ranges" is essentially what the LSP folding range operation already does. |
Actually in our case folding ranges is close but doesn't correspond to the splitting users need. They have been using Happy to provide details of document models where folding ranges don't correspond to sentence ranges if you folks are curious, but in the Coq/Rocq case it is easy to see: Theorem easy_arith : 2 + 2 = 4.
Proof. simpl. reflexivity. Qed. In this case we have 5 sentences and 1 folding range. |
This is a proposal to add a
textDocument/ast
to the LSP protocol. The idea is to have a capability that returns an AST from a selection.The following exist but don't actually result an AST:
textDocument/documentSymbol
textDocument/semanticTokens
textDocument/definition
andtextDocument/references
The text was updated successfully, but these errors were encountered: