-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent "bootstrapping" a full parser generator for deployed DSL implementations #267
Comments
There is a need know what contributes significantly to this bottlebeck:
Requirements on a solution:
It is highly likely that future parsers will be stored in different formats than .class files. This is (a) because the parsing technology becomes more dynamic, or (b) the class loading becomes the bottleneck in parsing. The fastest Java parser I know (JikesPG) is so fast because it does not use the class loading mechanism to instantiate a parser. Grammars are not only used for parsing in a lot of Rascal code. Every grammar reification, whether used for parsing or not, will trigger a full load of the parser generator. It is very hard and also brittle to weed out all the [Thinking-out-loud-begin] Looking at this from a programming language perspective: the parser is a compile-time constant that is derived from the grammar, which is also a compile-time constant. In concrete syntax expressions only this constant is used. And if we call the So in a way, this is a special case of constant-folding. Step 1:
Step 2:
So that amounts to:
Step 2 and 3 both lead into step. I'd prefer 2 over 4 but I don't know how to index |
By the way, this is not only a deployment concern, since Rascal programs with reified grammars really load slowly, they are also annoying whene making incremental developments in the REPL. Starting a new REPL takes way too much time. |
However, if we find a solution that only works for deployed programs, that would be great. |
I think there are many gains to be had in this area of rascal. You're analysis is right. But also it's a big change you are proposing. And not solving the specific issue we run into. What I want to try and specifically solve is to reduce the time a user is looking at syntax-highlighting free code. VS Code will trigger rascal the first time it's sees a file of a certain language. And since we've moved the syntax highlighting to the semantic highlighter, it will be without syntax highlighting. So what is happening before we get syntax highlighting?
So before we get to parse the file, we might have had to calculate the grammar, generate the parser and compiler the parser 4 to 5 times (depending on the amount of modules with concrete syntax). I want to find a way to cut that down. I would also prefer a way that's less depending on the current implementation of Rascal, but then again, the deployed scenarios are using fixed versions of rascal, so if that interface changes, we have a way to deal with that. My idea proposal was backwards compatible by making it a kw param. |
With regards to storing the grammar in a file, we would indeed be faster, since we only need to load a dedicated evaluator that runs the parser generator and compile the java files. I'll try and run some profiles to see which part we are spending most time in. |
Is suspect that in the given start-up scenario just before syntax highlighting works we are looking mainly at the loading of the parser generator and the reduction of the Side note: It could also be an elegant idea to move forward with the current compiler and compiler the parser generator to Java, and then use one (locked) instance between all the LSP instances (or have one for each instance). That way the loading time should be much lower and we can then focus on making the reification much faster as well (which would be implemented by the same generated Java code library). |
Profile, of a new already for speed purposes we have the following pattern: module lang::DSL::Syntax
syntax start Module = ...;
// since Rascal caches the internal parser, but they can inspire, we keep around an instance of our parser
// this helps us avoid regenerating parser if a user stopped interaction with their IDE for a while
private start[Module] (value, loc) cachedParser = parser(#start[Module]);
public start[Module] (value, loc) getParser() = cachedParser; It does mean that during import we do most of our calculation, and that the actual
So my observations (which align with your suspicions):
If we could do something about the parser generator load times and it's performance, that would be nice indeed. |
Excellent info.
|
I quickly added a timer around the parse in
fyi: I only measured the raw parsing time: var start = System.nanoTime();
IActionExecutor<ITree> actions = new NoActionExecutor();
ITree tree = new RascalParser().parse(Parser.START_MODULE, location.getURI(), data, actions, new DefaultNodeFlattener<IConstructor, ITree, ISourceLocation>(), new UPTRNodeFactory(true));
var stop = System.nanoTime();
var time = java.util.concurrent.TimeUnit.NANOSECONDS.toMillis(stop - start);
if (time > 500) {
System.err.println("Parsing: " + location.getURI().toString() + " took: " + time + "ms");
} |
That's a large file, and there are many others like Raw parsing time is not something that is easily optimized, as we already have optimized to the bone (@arnoldlankamp ); however, there is a UPTRNodeFactory included that maps the SPPF parse graph to the Separately the mapping to subclasses of AbstractAST could be expensive. |
This is may trace for
With CPU profiling turned on for Rascal itself, you get a good view of the largest modules:
|
But the slowness does not seem to be related much to the size of the file:
|
If you parse the same module again, the parsing time goes down aggressively. Possibly this is the JIT kicking in:
|
Observations:
|
|
|
With the JIT turned off:
Indeed, the parser does not run faster the second time or the third time. And it is also slower the first time. So we may attribute the difference between 100ms and 600ms entirely to the JIT compiler. |
|
|
If you want to measure the execution time of some piece of code you can use The parser implementation can be GC heavy for highly ambiguous grammars, since in those cases it produces a large graph in which many branches will remain 'alive' for a long time. So lots of marking/tracing/moving, but very little memory being freed up each cycle. But I doubt this is the main issue in this specific case, since there shouldn't be a large amount of long lived stacks running in parallel, while parsing a file using the rascal grammar. If it turns out a large portion of the time is spend by the JVM, you could try and do a run with the old CMS GC instead of G1. While G1 is faster for like 98% of the normal use cases, it can sometimes perform absolutely terribly on big and/or cyclic graph like data structures (like the parser produces). Since G1 segments the heap (to be able to deal with larger heap sizes efficiently), it needs to keep track of cross segment pointers. Graphs are generally harder to segment and it can run into trouble with these. |
As for improving the performance of the parser algorithm/implementation; that would be a challenge, as it's already very highly optimized across all available dimensions. The only semi-significant performance improvement that comes to mind is to implement something like an 'algorithm stack'. Classify each part of the grammar (as possibly ambiguous, certainly non-ambiguous, regular or unique match) and parse each segment using its most efficient implementation. However this will have a significant cost in terms of implementation complexity and will add a non-trivial analysis burden to the parser generator. Figuring out how to efficiently do that last part alone, could probably net you a doctoral degree 😄 . Other than that 🤷♂️ . Migrate the implementation to C or Rust 😛 ? |
But in terms of this specific issue. Why not pre-parse the standard library for Rascal, save the ASTs and ship them with the distribution? Just use the name + hash of the The process would, for example, look like:
|
When looking at the traces, nothing immediately stands out to me. It looks as I would expect ; a pretty even split between
The recursion in the flattener is not that surprising, since it needs to traverse all the way down the binarized parse forest to be able to flatten it. The flattener actually contains a conditional tail recursion optimization to keep the stack depth in check when handling lists (otherwise you'd run out of stack size fast). Doing it entirely without recursion would be even more complicated than it already is, because of all the code needed for keeping track of ambiguities and cycles in the parse forest. You're basically dealing with forking and recombining stacks; without using recursive functions you'd need to keep track of even more state. As for the custom data structures. These are basically specialized implementations aimed as reducing memory usage and improving performance. Maybe you could play around with the initial capacity a bit to see if that has any noticeable impact. As for the implementations themselves; they were written in the JDK 1.6 era with the JIT's capabilities at that time in mind. Maybe some things could be written slightly differently today, but that would require some looking in to and the gains likely won't be huge. These data structures are heavily used, which is why a significant amount of time is spend in their methods in the trace. As for the GC, looking at that graph, it doesn't seem to be an issue. While G1 can offload work to worker threads when worst-case behavior pops-up (which may not be properly reflected in the GC stats), it doesn't seem to be what's happening here. Can we maybe narrow down this issue by having a look to find out if there are specific pieces of syntax or certain grammars it struggles more with than others? Also, maybe looking at the parser's trace logging can give us some clue as to what it's doing specifically. Maybe we can spot something out of the ordinary in there. So we can have a more targeted look at the implementation. |
According to my analysis that would only help a bit. The parser generator code is loaded for the grammar reification operator anyway. |
It may be the case that 50% of the time in parsing string templates is spent in temporary soon to be dead stacks. I'll have a look in the grammar. The big module has one very long string templates. Nevertheless it remains the module that trains the JIT , so it will always be the slower module with the current setup. |
I think it all looks good to me. I was wondering (since the profile shows the |
Another cause could be a cyclic grammar. If that happens in whitespace we'd never see it when interpreting the tree, but we'd pay for it nevertheless. |
Yes, the parse trees are quite a big bigger, but reading their binary format from disc is a process that should scale fairly linearly with file size and should be quicker than parsing the source file, since parsing is a lot more compute heavy. How much the difference will be I can only guess, but writing a benchmark for reading in a parse tree from disc should be a relatively easy thing to do (maybe you can even time it in the REPL or something?). If the difference is large enough, implementing a caching solution like this could resolve the issue without too much time investment. |
I'm not familiar with what the string template grammar looks like, but to give some background calling
Passing a debug listener to the parser might give us some information about what's going on. It will probably invoke more |
Having said that, it may also well be that is turns out nothing out of the ordinary is happening when we look at the logging, but if large |
I just ran some stats on all
So in 369324 of the total 1204224 gets did we end up in a node with 1 or more hash collisions. So 30% of the gets are most likely multiple cache misses. I'm glad to see they aren't that deep, but it might still pay off to see if some open addressing could make that a bit more close to each other. |
If I run
This is explainable since the parser superclass has already been optimized by the JIT having to have parsed |
Reading the parse tree from disk:
So that's 80ms. compared to 126 (but also so often lower). We will not win the battle with caching the parse trees on disk, even though it is about 40% faster. It remains something to consider, but for know it seems making sure the parser generator is not loaded at all is still the most viable option for solving this issue. |
I've butchered the code to use linear probing, and also tried out with different existing IntObjectMaps, they are all getting to the same performance numbers. So I think only by making a new dedicated data structure that tries it's best to explain to the jitter about range checks etc, could we maybe squeeze out a bit more here. But the current code, even with the 30% of hash collisions, is fast, and will take some effort to improve upon. |
This would not expose directly to the Rascal level what kind of encoding we are using to encode the parser, and it could be used to avoid the use of the |
Well it was worth a try. In theory open addressing could have been a good fit or this specific use-case. In the past I've also found that open addressing may not always perform better than a chaining implementation. The reasons I found back then were the following:
|
yup, agreed, that was roughly the path I took. Decrease the load factor to have less clustering. |
usethesource/rascal#1815 this adds the fundamental capability of saving and storing "parser functions" on disk.
|
Nice! Do you have an idea of how we could trigger this caches (the once in the evaluator for concrete syntax)? Or is that still WIP? |
WIP.
|
if it's automatic, that would be nice, than it's just an aspect of packaging :) |
You may be able to use the Exec Maven Plugin for this. This plugin allows you to execute a Java program in a specific build phase. Like |
usethesource/rascal#1815 is ready for review if anyone is interested. it should go a long way in preventing the bootstrapping of a parser generator, if applied to both the parsing of concrete syntax fragments and the parsing of DSL files. In a DSL parser, the parser contributions would not use In the concrete syntax fragment parser we'd do a similar trick. That's for another PR. The docs of ParseTree.rsc contain an example usage of |
The overhead of packaging all the class files from the in-memory file system into one binary file is significant. So I can't make it automatic inside the interpreter. It could be a default side-effect of the type checker though, to write a parser file next to each .tpl file. |
:-)
|
Ok next step; usethesource/rascal#1816
|
Given our metrics above, this should eliminate any prohibitive loading times for DSL implementations, if applied in concert with a bespoke call to |
After some more iterations on this, it now works, you can avoid generating grammars at runtime. There is still however the loading of the ParserGenerator the first time a concrete syntax pattern is found. And this is for the @jurgenvinju any ideas on if we could reduce that? |
I'll have a look. It's possible to make a fast and easy path of sym2symbol for the most-used cases. |
Or we can load the module with sym2symbol in it, without loading the entire parser generator... |
Closing this issue, the discussion has improved the performance in many ways. All the others rest on the compiler's shoulders ;) |
Thanks @DavyLandman ; probably a lot of happy users from this. |
Is your feature request related to a problem? Please describe.
Currently a DSL can take up to 30s to load, due to Rascal's overhead of generating parsers for files with concrete syntax. Users are looking at a piece of text without syntax highlighting.
Describe the solution you'd like
I want to extend the registerLanguage call with a classname. If that classname is defined, we use that for parsing files in that language. This would allow us to provide syntax highlighting as quickly as the JVM has loaded the jars. (a developer is responsible for correctly generating a rascal parser class).
Describe alternatives you've considered
Additional context
This is a deployment concern. During development these annotations should be ignored.
The text was updated successfully, but these errors were encountered: