Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parser performance by removing old JDT/JIT tuning #1813

Merged
merged 1 commit into from
Jun 18, 2023

Conversation

DavyLandman
Copy link
Member

This change made the parser run between 8 and 12% faster.

Since the discussion in usethesource/rascal-language-servers#267 pointed towards the parser, I took a look at the profiler, and this method popped up. I thought, maybe 11y of changes of the JDK have caught up.

I think primarily the Arrays.copyOf method is implemented in native code, so that gives some benefits.

I wrote a small java program that would load all rascal files in the rascal project in memory, and would run the parser over that.

The old code before this fix:

Parsing took: 10153 ms
Parsing took: 8624 ms
Parsing took: 8548 ms
Parsing took: 8518 ms
Parsing took: 8445 ms
Parsing took: 8448 ms
Parsing took: 8462 ms
Parsing took: 8458 ms
Parsing took: 8472 ms
Parsing took: 8527 ms

You see the first one the loop that the JIT kicked in.

After applying this change:

Start parsing
Parsing took: 9412 ms
Parsing took: 7633 ms
Parsing took: 7587 ms
Parsing took: 7634 ms
Parsing took: 7558 ms
Parsing took: 7608 ms
Parsing took: 7629 ms
Parsing took: 7753 ms
Parsing took: 7636 ms
Parsing took: 7631 ms

So depending on which values you compare, anywhere between 8 and 12% speedup.

@codecov
Copy link

codecov bot commented Jun 17, 2023

Codecov Report

Merging #1813 (12aed30) into main (f2e37a0) will increase coverage by 0%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##              main   #1813   +/-   ##
=======================================
  Coverage       48%     48%           
- Complexity    6031    6039    +8     
=======================================
  Files          670     670           
  Lines        58541   58539    -2     
  Branches      8536    8536           
=======================================
+ Hits         28570   28581   +11     
+ Misses       27808   27794   -14     
- Partials      2163    2164    +1     
Impacted Files Coverage Δ
.../org/rascalmpl/parser/gtd/stack/edge/EdgesSet.java 66% <100%> (-2%) ⬇️

... and 6 files with indirect coverage changes

@arnoldlankamp
Copy link
Contributor

I'm pretty sure the main gain here is from the removal of the bounds-check-elimination workaround (i.e. the weird while construct that you turned back into an if).
This was something that improved performance in earlier versions of JDK 1.6, since the JIT compiler couldn't / wouldn't eliminate the unnecessary bounds-check on the array access in the add method, since it's situated after a call to a native function (System.arraycopy) on the execution path where the enlarge method was called, of which the JIT compiler has no internal knowledge, since the it can only 'see' method local code (including inlined code, but can't look into native calls), so could not guarantee the array access could never cause an out-of-bounds exception in this case, so it remained in place.
If I recall, this shortcoming was resolved in one of the later patch versions of JDK 1.6. They probably added some special cases for things like System.arraycopy or implemented a code path specific optimization where both possible execution paths are optimized separately, since being able to do these kinds of optimizations would also greatly benefit the performance of the data structures in the standard library.

The use of Arrays.copyOf will most definitely have no impact (or a very minor negative one), since it's implementation just invokes System.arraycopy for any non-zero length array. So all this specific change does is add a superfluous method call.

Arrays.copyOf is in essence nothing more than a convenience wrapper for System.arraycopy, which returns a value, so users can chain off of it 😉 .

This change made the parser run between 8 and 12% faster
@DavyLandman
Copy link
Member Author

DavyLandman commented Jun 18, 2023

I was also curious. And it's the other way around. The ifcheck is actually still good. It's all in the implementation of Arrays.copyOf.

After only using the the arrays.copyof, it actuall was even a bit faster (note I also changed the benchmark to run at highest priority).

main branch:

Start parsing
Warmup took: 9544 ms
Warmup took: 8108 ms
Parsing took: 8043 ms
Parsing took: 8054 ms
Parsing took: 8062 ms
Parsing took: 8006 ms
Parsing took: 8077 ms
Parsing took: 8143 ms
Parsing took: 8078 ms
Parsing took: 8016 ms
Parsing took: 8103 ms
Parsing took: 8085 ms
Done: 26130420, average time: 8067 ms

this PR:

Warmup took: 8748 ms
Warmup took: 7055 ms
Parsing took: 6980 ms
Parsing took: 6967 ms
Parsing took: 7063 ms
Parsing took: 6919 ms
Parsing took: 6920 ms
Parsing took: 7011 ms
Parsing took: 6931 ms
Parsing took: 7012 ms
Parsing took: 7049 ms
Parsing took: 6935 ms
Done: 26130420, average time: 6979 ms

So roughly 13%.

Why is Arrays.copyOf faster? Since it's marked as a HotSpotIntrinsicCandidate

The @HotSpotIntrinsicCandidate annotation is specific to the HotSpot Virtual Machine. It indicates that an annotated method may be (but is not guaranteed to be) intrinsified by the HotSpot VM. A method is intrinsified if the HotSpot VM replaces the annotated method with hand-written assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve performance.

So I'm guessing it does the allocation and moving in a more compact way.

@jurgenvinju
Copy link
Member

Ok great. Thanks for the discussion @arnoldlankamp @DavyLandman . I'm merging this. Every 10% helps!

@jurgenvinju jurgenvinju merged commit 2331fec into main Jun 18, 2023
@jurgenvinju jurgenvinju deleted the parser-performance branch June 18, 2023 07:10
@arnoldlankamp
Copy link
Contributor

Interesting, so what seems to have happened is that they basically added a second native array copy implementation. The big difference is that the new one uses SIMD.

But why they shoehorned this optimized version into the Arrays wrapper implementation, instead of just optimizing System.arraycopy, so everyone benefits, is a bit of a mystery 🤔 .

@arnoldlankamp
Copy link
Contributor

Maybe a next step would be to replace all other occurrences of System.arraycopy, since Arrays.copyOf is currently the faster implementation?

@DavyLandman
Copy link
Member Author

DavyLandman commented Jun 18, 2023 via email

@arnoldlankamp
Copy link
Contributor

Yep, they'd need an if statement to check that first 😄 .

@jurgenvinju
Copy link
Member

I'm very happy you guys made the parser more efficient.

Now I think it's time we close this and move back to usethesource/rascal-language-servers#267, because we should focus on not having the start the entire bootstrapping of the parser generator in deploy-mode at all for the interpreted modules of DSLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants