Improve parser performance by removing old JDT/JIT tuning #1813

DavyLandman · 2023-06-17T20:48:33Z

This change made the parser run between 8 and 12% faster.

Since the discussion in usethesource/rascal-language-servers#267 pointed towards the parser, I took a look at the profiler, and this method popped up. I thought, maybe 11y of changes of the JDK have caught up.

I think primarily the Arrays.copyOf method is implemented in native code, so that gives some benefits.

I wrote a small java program that would load all rascal files in the rascal project in memory, and would run the parser over that.

The old code before this fix:

Parsing took: 10153 ms
Parsing took: 8624 ms
Parsing took: 8548 ms
Parsing took: 8518 ms
Parsing took: 8445 ms
Parsing took: 8448 ms
Parsing took: 8462 ms
Parsing took: 8458 ms
Parsing took: 8472 ms
Parsing took: 8527 ms

You see the first one the loop that the JIT kicked in.

After applying this change:

Start parsing
Parsing took: 9412 ms
Parsing took: 7633 ms
Parsing took: 7587 ms
Parsing took: 7634 ms
Parsing took: 7558 ms
Parsing took: 7608 ms
Parsing took: 7629 ms
Parsing took: 7753 ms
Parsing took: 7636 ms
Parsing took: 7631 ms

So depending on which values you compare, anywhere between 8 and 12% speedup.

codecov · 2023-06-17T20:52:19Z

Codecov Report

Merging #1813 (12aed30) into main (f2e37a0) will increase coverage by 0%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##              main   #1813   +/-   ##
=======================================
  Coverage       48%     48%           
- Complexity    6031    6039    +8     
=======================================
  Files          670     670           
  Lines        58541   58539    -2     
  Branches      8536    8536           
=======================================
+ Hits         28570   28581   +11     
+ Misses       27808   27794   -14     
- Partials      2163    2164    +1

Impacted Files	Coverage Δ
.../org/rascalmpl/parser/gtd/stack/edge/EdgesSet.java	`66% <100%> (-2%)`	⬇️

... and 6 files with indirect coverage changes

arnoldlankamp · 2023-06-17T23:08:16Z

I'm pretty sure the main gain here is from the removal of the bounds-check-elimination workaround (i.e. the weird while construct that you turned back into an if).
This was something that improved performance in earlier versions of JDK 1.6, since the JIT compiler couldn't / wouldn't eliminate the unnecessary bounds-check on the array access in the add method, since it's situated after a call to a native function (System.arraycopy) on the execution path where the enlarge method was called, of which the JIT compiler has no internal knowledge, since the it can only 'see' method local code (including inlined code, but can't look into native calls), so could not guarantee the array access could never cause an out-of-bounds exception in this case, so it remained in place.
If I recall, this shortcoming was resolved in one of the later patch versions of JDK 1.6. They probably added some special cases for things like System.arraycopy or implemented a code path specific optimization where both possible execution paths are optimized separately, since being able to do these kinds of optimizations would also greatly benefit the performance of the data structures in the standard library.

The use of Arrays.copyOf will most definitely have no impact (or a very minor negative one), since it's implementation just invokes System.arraycopy for any non-zero length array. So all this specific change does is add a superfluous method call.

Arrays.copyOf is in essence nothing more than a convenience wrapper for System.arraycopy, which returns a value, so users can chain off of it 😉 .

This change made the parser run between 8 and 12% faster

DavyLandman · 2023-06-18T06:10:51Z

I was also curious. And it's the other way around. The ifcheck is actually still good. It's all in the implementation of Arrays.copyOf.

After only using the the arrays.copyof, it actuall was even a bit faster (note I also changed the benchmark to run at highest priority).

main branch:

Start parsing
Warmup took: 9544 ms
Warmup took: 8108 ms
Parsing took: 8043 ms
Parsing took: 8054 ms
Parsing took: 8062 ms
Parsing took: 8006 ms
Parsing took: 8077 ms
Parsing took: 8143 ms
Parsing took: 8078 ms
Parsing took: 8016 ms
Parsing took: 8103 ms
Parsing took: 8085 ms
Done: 26130420, average time: 8067 ms

this PR:

Warmup took: 8748 ms
Warmup took: 7055 ms
Parsing took: 6980 ms
Parsing took: 6967 ms
Parsing took: 7063 ms
Parsing took: 6919 ms
Parsing took: 6920 ms
Parsing took: 7011 ms
Parsing took: 6931 ms
Parsing took: 7012 ms
Parsing took: 7049 ms
Parsing took: 6935 ms
Done: 26130420, average time: 6979 ms

So roughly 13%.

Why is Arrays.copyOf faster? Since it's marked as a HotSpotIntrinsicCandidate

The @HotSpotIntrinsicCandidate annotation is specific to the HotSpot Virtual Machine. It indicates that an annotated method may be (but is not guaranteed to be) intrinsified by the HotSpot VM. A method is intrinsified if the HotSpot VM replaces the annotated method with hand-written assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve performance.

So I'm guessing it does the allocation and moving in a more compact way.

jurgenvinju · 2023-06-18T07:10:20Z

Ok great. Thanks for the discussion @arnoldlankamp @DavyLandman . I'm merging this. Every 10% helps!

arnoldlankamp · 2023-06-18T08:37:50Z

Interesting, so what seems to have happened is that they basically added a second native array copy implementation. The big difference is that the new one uses SIMD.

But why they shoehorned this optimized version into the Arrays wrapper implementation, instead of just optimizing System.arraycopy, so everyone benefits, is a bit of a mystery 🤔 .

arnoldlankamp · 2023-06-18T08:39:07Z

Maybe a next step would be to replace all other occurrences of System.arraycopy, since Arrays.copyOf is currently the faster implementation?

DavyLandman · 2023-06-18T08:48:39Z

It's because with System.arrayCopy you have no guarantees about non overlapping source & dest. So if you know the memory cannot overlap, you can be faster.

…

On Sun, 18 Jun 2023, 10:38 arnoldlankamp, ***@***.***> wrote: Interesting, so what seems to have happened is that they basically added a second native array copy implementation. The big difference is that the new one uses SIMD. But why they shoehorned this optimized version into the Arrays wrapper implementation, instead of just optimizing System.arraycopy, so everyone benefits, is a bit of a mystery 🤔 . — Reply to this email directly, view it on GitHub <#1813 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABL3EZ5SJPSLDKNGJPTOOTXL246RANCNFSM6AAAAAAZKN7OVQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

arnoldlankamp · 2023-06-18T08:55:57Z

Yep, they'd need an if statement to check that first 😄 .

jurgenvinju · 2023-06-19T11:22:41Z

I'm very happy you guys made the parser more efficient.

Now I think it's time we close this and move back to usethesource/rascal-language-servers#267, because we should focus on not having the start the entire bootstrapping of the parser generator in deploy-mode at all for the interpreted modules of DSLs.

DavyLandman requested review from arnoldlankamp and jurgenvinju June 17, 2023 20:48

Improve parser performance by removing old JDT/JIT tuning

12aed30

This change made the parser run between 8 and 12% faster

DavyLandman force-pushed the parser-performance branch from b3c36ca to 12aed30 Compare June 18, 2023 06:06

jurgenvinju merged commit 2331fec into main Jun 18, 2023

jurgenvinju deleted the parser-performance branch June 18, 2023 07:10

arnoldlankamp mentioned this pull request Jun 18, 2023

Prevent "bootstrapping" a full parser generator for deployed DSL implementations usethesource/rascal-language-servers#267

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parser performance by removing old JDT/JIT tuning #1813

Improve parser performance by removing old JDT/JIT tuning #1813

DavyLandman commented Jun 17, 2023

codecov bot commented Jun 17, 2023 •

edited

Loading

arnoldlankamp commented Jun 17, 2023

DavyLandman commented Jun 18, 2023 •

edited

Loading

jurgenvinju commented Jun 18, 2023

arnoldlankamp commented Jun 18, 2023

arnoldlankamp commented Jun 18, 2023

DavyLandman commented Jun 18, 2023 via email

arnoldlankamp commented Jun 18, 2023

jurgenvinju commented Jun 19, 2023

Improve parser performance by removing old JDT/JIT tuning #1813

Improve parser performance by removing old JDT/JIT tuning #1813

Conversation

DavyLandman commented Jun 17, 2023

codecov bot commented Jun 17, 2023 • edited Loading

Codecov Report

arnoldlankamp commented Jun 17, 2023

DavyLandman commented Jun 18, 2023 • edited Loading

jurgenvinju commented Jun 18, 2023

arnoldlankamp commented Jun 18, 2023

arnoldlankamp commented Jun 18, 2023

DavyLandman commented Jun 18, 2023 via email

arnoldlankamp commented Jun 18, 2023

jurgenvinju commented Jun 19, 2023

codecov bot commented Jun 17, 2023 •

edited

Loading

DavyLandman commented Jun 18, 2023 •

edited

Loading