-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use regular jdk String.intern() #350
base: master
Are you sure you want to change the base?
Conversation
I'm not sure why we didn't use `String.intern()` from the start. This PR doesn't only simplify our codebase, but saves us the `internedStrings` ConcurrentHashMap. The heap size for a large CPG (linux 4 kernel) goes down from 35830m to 35260m, i.e. 570m, or 1.6%.
cf https://shipilev.net/jvm/anatomy-quarks/10-string-intern/ The tl;dr is that However, I'm open to being convinced by benchmarks. A critical question (in context of mesh) would be impact on GC times and memory consumption if we do multiple scans without restarting the JVM in the middle. |
Do you really think that How about some scans on joern instead? I don't want to make a scientific paper out of it, just a few observations re GC and memory usage for multiple scans on the same JVM instance - sounds good? |
Sounds like we want to keep things the way they are. Sadly there is no date on the linked article but i would also want to see some benchmarks before continuing with this. |
would you be happy with the one described in my comment above? |
Sounds good to me. Just run a couple different scans after another. My main point about mesh was that we must include multiple scans after each other without jvm restart, because string.intern() uses special stuff and we pay not just for the call to string.intern but also distributed over future GC pauses. However, it sounds like the general consensus is "never ever use I would not be opposed to writing a more appropriate hash-table for string interning, though. How did you identify the interning as a problem? ("more appropriate": We can relax some requirements inherent to |
When I opened up a large database heapdump (CPG with linux 4 kernel), the StringInterner stood out as one of the big blobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think this over for some time.
I'm not sure why we didn't use
String.intern()
from the start.This PR doesn't only simplify our codebase, but saves us the
internedStrings
ConcurrentHashMap. The heap size for a large CPG(linux 4 kernel) goes down from 35830m to 35260m, i.e. 570m, or 1.6%.