Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load #297

DavyLandman · 2023-09-18T12:30:12Z

Note, this is only for VS Code DSL deployments, not for normal interactive development.

What we've observed is that users associate no syntax highlighting as an indication that "it" is not working. When a VS Code extension that uses rascal starts, it can take up to 5 or 10s (depending on the machine even longer) before the all the modules are loaded, and the parser function is handed to the lsp server. Many improvements have already been made (see excellent discussion in #267) but we wanted to have the opportunity to really close the gap.

This change builds upon the work of @jurgenvinju to offer an opaque file that contains precompiled parsers. It's only available when using the typescript (aka npm/deployment) path, but in that case it will load a parser without any need for an evaluator.

…immediatly, without waiting for a evaluator to finish loading

rascal-vscode-extension/src/lsp/ParameterizedLanguageServer.ts

jurgenvinju · 2023-09-18T12:46:33Z

Looks good as code. I'm sceptical wrt the design and the maintainability for the future.

Please explain why we actually need this additional support. Not the motivation from above, but the reason there are still seconds to load the parser. I thought I had covered all the tracks. Is a parser generator still loading? Or are we avoiding the loading of any module of the language implementation now?

I'm hesitant with this design, since if this additional API is avoidable, that would be better since: (a) it is a lot of code, (b) an extra externally visible API, which (c) it more than doubles the number of fields in the current contract for Language) and (d) in the compiled world it probably makes no sense anymore.

It also breaks the consistency of the parametrized-lsp design that goes normally hrough Rascal callbacks (IFunction instances). Instead this ties the parameterized-lsp bridge completely into the Java implementation of the brand new function loadParsers in Prelude.java, while my intention was that DSL designers would write a Rascal function that calls loadParsers themselves.

So I need some convincing; are there no alternatives for optimizing this problem that keep the design as-is? What is currently taking the time before the loaded parser is presented to the LSP link?

DavyLandman · 2023-09-18T13:01:58Z

Thanks for reviewing.

Currently, the first parse starts the extension, which does:

registerLanguage
start new evaluator
import root module defined by register language
call the main function
which returns a IConstructor that contains the parser function
calll the parser function

Now, this already takes a lot less with the precompiled module parsers and the precompiled stored parser. But it still can (especially on older/slower laptops) take up to 10s--20s. Quite some of the time (as also seen in the profiles in #267) is spend in parsing rascal modules, and general import stuff, I have not been able to identify further opportunities for performance reductions. So indeed, the final step I propose to close the gap is to skip all this module loading.

I agree it does grow the interface on the typescript side, and I don't really like that. We could mark it as deprecated/internal/experimental? The benefit is that the npm dependencies don't follow VS Code upgrades, so if we deprecate it at a later point, people's extensions won't be broken, since they'll have to explicitly update.

An alternative that would get us a bit closer would be to use the multiple registry feature to load a specific one that only has the syntax module imported. This would reduce the import times, but it would impact memory pressure, as we would be having yet another evaluator alive just for that syntax module.

DavyLandman · 2023-09-18T13:34:35Z

Here are some fresh profiles, I didn't catch it from the start, but it roughly took 15s--18s to load.

looking at the hot spots:

most of the time (60%) is spend in SGTDBF::parse
quite some time spend in io.usethesource.vallang.impl.fields.ConstructorWithKeywordParametersFacade.accept () which is called by the AST builder a lot.
also 40% of the time in org.rascalmpl.values.RascalFunctionValueFactory.createHole () which suprise suprise, does start an evaluator to get the parser generator. (I had not spotted this the last time)

so we maybe we could try and remove the parser generator dependency for createHole?

that would take of at least a few seconds again. But in the end, parsing & ast building for a large set of files (our checker is not small) will take time as long as we are not compiled.

jurgenvinju · 2023-09-18T14:06:54Z

Thanks for the analysis. 10 or 20 seconds to load a few modules does not make sense to me. Something is very weird in this entire profile.

Parsing a large module, purely the SGTBF part should take 200ms for very long files like List.rsc. But that's only for the first large file. Once the parser is warmed up by the JIT the same module should take 70ms or less.
Say we are parsing 10 modules, that would be around 1 second.
Then we have the mapping to AST, that's a lot less work but does require memory allocation. Another second.
Then we may have concrete syntax fragments. per file that should not take more than the original parse. So another second.

In theory, loading 10 modules might take around 3 seconds, worst case, but not 10 or 20 seconds. So I think we have to dig a little deeper here. If we really can't find the issue, well then we have to come up with some temporary solution like you proposed. Note that the latest solution makes the concrete syntax parser cache dead. But perhaps this is for other use cases still handy.

I'm going to do some code staring based on the profiles you shared.

jurgenvinju · 2023-09-18T14:24:18Z

In import there are 3 visitors over parse trees, none of these skip layout. That accounts for a factor 2 in that stage probably. Visiting and re-building those trees for nothing.
In ASTBuilder we use TreeAdapter.getASTArgs and TreeAdapter.getListASTArgs. Those allocate memory in the order of the size of the entire tree together, and then throw it right away again. So that could account for a third (?) of the time building AST nodes.
- It would be best to inline these loops that skip layout and terminals, so no temporary memory must be allocated.
- Note that it is possible to change these two functions from producing IList to producing Iterable<IValue> to achieve the same effect, but that would still allocate an Iterable for every node and keep an extra layer of memory allocation in the order of the size of the tree. Not sure if escape analysis could catch those, but the current IListWriters are definitely beyond the scope of escape analysis.
SGTBF::parse also contains the mapping from parse graphs to parse trees. If there are hidden ambiguities (e.g. cycles in layout), then this can cost a lot more than expected. What's under this method in the profile?

jurgenvinju · 2023-09-18T14:29:25Z

it's possible to map from SGTBF parse forest directly into the AST class hierarchy. That would skip an entire tree and memory allocation stage for Rascal programs. This is a bigger refactoring though that I'd rather avoid since we would not need it anymore in a compiled world.

DavyLandman · 2023-09-18T14:30:21Z

Looking at the profile, most of the SGTBF::parse calls take between 100--500ms, I even found two that took 900ms and 1200ms.

I've shared you a trace file so you can see what's happening.

DavyLandman · 2023-09-22T10:05:35Z

I've tested this, it works out as intended, the grammar loads fast and the users sees syntax highlighting well before the evaluator is loaded.

jurgenvinju

Great and good that it works. We have the reservations regarding the temporariness of this interface in place. I just would like to see the reuse of the store parser behind the interface that I designed for this, and not the implementation copied. Do you want me to fix this? I think I'd take only 30 minutes or so.

jurgenvinju · 2023-09-22T13:11:41Z

rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/parametric/ParserOnlyContribution.java

+    private final String name;
+    private final String extension;
+    private final @Nullable Exception loadingParserError;
+    private final @Nullable Class<IGTD<IConstructor, ITree, ISourceLocation>> parserClass;


Why leak these implementation details if Prelude.loadParsers hides all of that? All you have to do is call that function with the right source location. You receive an IFunction that can be called directly with an input source location or string (first parameter), then the location used for the @loc annotations. The options are provided with the loadParsers method. If you don't want to instantiate a Prelude, a RascalFunctionValueFactory instance with an empty Evaluator for the constructor will also do fine.

With this you have saved 130 lines of code, and also you can hide behind the interface in case future parser implementations change the way they are implemented.

After the IFunction is acquired, there is no more need for the RascalFunctionValueFactory and its empty Evaluator, or a Prelude instance. The function instance will lock on the evaluator only when/if it is called from the evaluator, and not if called manually via the IFunction.call interface.

I had to copy that code since it had a hard dependency on Evaluator. I specificaly wanted not to load an evaluator.

rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/parametric/ParserOnlyContribution.java

rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/terminal/ITerminalIDEServer.java

…-dsls

DavyLandman · 2023-09-26T10:57:50Z

Yes, I've tested it, this alternative approach works. I really wanted to avoid starting an evaluator. But I agree, just constructing it and not doing anything with it should be an acceptable tradeoff.

sonarcloud · 2023-09-26T10:58:36Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

0.0% Coverage
0.0% Duplication

DavyLandman added 2 commits September 15, 2023 20:50

Sketching out how we can have an extra parameter that loads a parser …

379a40f

…immediatly, without waiting for a evaluator to finish loading

Improved interface specification

2d4abdc

DavyLandman requested a review from jurgenvinju September 18, 2023 12:30

DavyLandman changed the title ~~Allow DSLs deployments to load a grammar from the precompiled file without waiting for evaluators to load~~ Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load Sep 18, 2023

jurgenvinju reviewed Sep 18, 2023

View reviewed changes

rascal-vscode-extension/src/lsp/ParameterizedLanguageServer.ts Outdated Show resolved Hide resolved

jurgenvinju reviewed Sep 18, 2023

View reviewed changes

rascal-vscode-extension/src/lsp/ParameterizedLanguageServer.ts Outdated Show resolved Hide resolved

Implemented initial support for preloading syntax highlighting

c44ce90

DavyLandman force-pushed the preloaded-grammar-for-dsls branch from 867bed2 to c44ce90 Compare September 18, 2023 12:40

Removed 2 code smells

8e11cb0

Fixed bugs and renamed fields

9f8a803

Improved error reporting

fc8fe11

DavyLandman force-pushed the preloaded-grammar-for-dsls branch from 55c6942 to fc8fe11 Compare September 22, 2023 10:07

jurgenvinju self-assigned this Sep 22, 2023

jurgenvinju requested changes Sep 22, 2023

View reviewed changes

DavyLandman and others added 8 commits September 22, 2023 15:39

Merge remote-tracking branch 'origin/main' into preloaded-grammar-for…

dbb25b9

…-dsls

Reuse loc parsing code

b19c284

Merge branch 'main' into preloaded-grammar-for-dsls

15b0381

shorter implementation for parser loading that hides the implementation

6d45dc3

organized imports

1063221

removed dead variable

c1e151b

switched parameters to parse function

a4e5765

changed Entry to Either

7f88d50

jurgenvinju and others added 2 commits September 26, 2023 08:01

organized imports

be5f6a5

Fixed warnings and made sure not to have null pointers

7a13ce4

jurgenvinju merged commit 3d70dae into main Sep 26, 2023
9 checks passed

DavyLandman deleted the preloaded-grammar-for-dsls branch September 26, 2023 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load #297

Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load #297

DavyLandman commented Sep 18, 2023

jurgenvinju commented Sep 18, 2023

DavyLandman commented Sep 18, 2023 •

edited

Loading

DavyLandman commented Sep 18, 2023 •

edited

Loading

jurgenvinju commented Sep 18, 2023

jurgenvinju commented Sep 18, 2023 •

edited

Loading

jurgenvinju commented Sep 18, 2023

DavyLandman commented Sep 18, 2023

DavyLandman commented Sep 22, 2023

jurgenvinju left a comment

jurgenvinju Sep 22, 2023

jurgenvinju Sep 22, 2023

DavyLandman Sep 22, 2023

DavyLandman commented Sep 26, 2023

sonarcloud bot commented Sep 26, 2023

Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load #297

Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load #297

Conversation

DavyLandman commented Sep 18, 2023

jurgenvinju commented Sep 18, 2023

DavyLandman commented Sep 18, 2023 • edited Loading

DavyLandman commented Sep 18, 2023 • edited Loading

jurgenvinju commented Sep 18, 2023

jurgenvinju commented Sep 18, 2023 • edited Loading

jurgenvinju commented Sep 18, 2023

DavyLandman commented Sep 18, 2023

DavyLandman commented Sep 22, 2023

jurgenvinju left a comment

Choose a reason for hiding this comment

jurgenvinju Sep 22, 2023

Choose a reason for hiding this comment

jurgenvinju Sep 22, 2023

Choose a reason for hiding this comment

DavyLandman Sep 22, 2023

Choose a reason for hiding this comment

DavyLandman commented Sep 26, 2023

sonarcloud bot commented Sep 26, 2023

DavyLandman commented Sep 18, 2023 •

edited

Loading

DavyLandman commented Sep 18, 2023 •

edited

Loading

jurgenvinju commented Sep 18, 2023 •

edited

Loading