-
Notifications
You must be signed in to change notification settings - Fork 37
Fix root scanning #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix root scanning #165
Conversation
Wow! Nice work! 40% improvement on time.stw for |
I think the two PRs are different, although they share some common changes to the CodeCache. This PR tries to optimize CodeCache scanning and does not affect correctness. #141 tries to fix memory leak caused by incorrect root scanning. Also when I was working on #141, I haven't figured out how to scan CodeCache efficiently. After this PR, I will continue working on #141. It still has some difficulty with MarkCompact at the moment... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive performance improvement. I have some questions about the code, especially about the thread local NMETHOD_SLOTS
. Also with the merging of #163, this PR will need to adapt the changes.
|
||
static CODE_CACHE_ROOTS: spin::Lazy<Mutex<HashMap<Address, Vec<Address>>>> = | ||
spin::Lazy::new(|| Mutex::new(HashMap::new())); | ||
static CODE_CACHE_ROOTS_SIZE: AtomicUsize = AtomicUsize::new(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this as a separate global variable. Whenever you change this SIZE
, you have to acquire a lock to change the CODE_CACHE_ROOTS
. And in the only place where you read the SIZE
, you also acquire a lock to read the CODE_CACHE_ROOTS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in mmtk_register_nmethod()
, you modify the SIZE
before acquiring a lock to the ROOTS
, which will make the SIZE
and the ROOTS
inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, SIZE
is not necessary, it's just an optimization. Because CODE_CACHE_ROOTS
is a set of Vec<Address>
s, at root scanning time these vectors have to be merged together as a single buffer. Record total SIZE
ahead of time and use it later for the merged buffer allocation may prevent frequent vector size expansion.
The consistency is ensured by separating the writes and reads. SIZE
is only updated during mutator phases, and reads only happen in pauses.
Apart from tracking SIZE
, I also tried putting each vector in the cached roots (HashMap<Address, Vec<Address>>
) to a different work packet. There's a slight performance drop by doing so, since this may yield too many vector clones, too many work packets, and most of the vectors are small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
SIZE
is not necessary, it's just an optimization. BecauseCODE_CACHE_ROOTS
is a set ofVec<Address>
s, at root scanning time these vectors have to be merged together as a single buffer. Record totalSIZE
ahead of time and use it later for the merged buffer allocation may prevent frequent vector size expansion.
You can just calculate the total size of the Vec<Address>
before creating the vec.
let code_cache_roots = CODE_CACHE_ROOTS.lock().unwrap();
let size = code_cache_roots.values().map(|vec| vec.len()).sum();
let mut vec = Vec::with_capacity(size);
for roots in code_cache_roots.values() {
for r in roots {
vec.push(*r)
}
}
My point is that if you have acquired the lock to CODE_CACHE_ROOTS
whenever you access CODE_CACHE_ROOTS_SIZE
, then there is little point having a separate count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid calculating the size in a loop may slow down the GC time. If I can calculate the total size cheaply ahead of time, simply using the pre-calculated size here is the fastest way.
This makes me realize that if I move the SIZE
increments after the lock, I can do relaxed instead of atomic increments. This solves the consistency issue, has higher the increments performance, while still having a precise pre-calculated size counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you always access the CODE_CACHE_ROOTS_SIZE
while holding the lock of CODE_CACHE_ROOTS
, you can merge them into one Mutex<T>
, like this:
struct CodeCacheRoots {
hash_map: HashMap<Address, Vec<Address>>,
size: usize,
}
lazy_static! {
static ref CODE_CACHE_ROOTS: Mutex<CodeCacheRoots> = Mutex::new(CodeCacheRoots {
hash_map: HashMap::new(),
size: 0,
});
}
When using, you lock the mutex and have access to both hash_map
and size
:
{
let mut lock_guard = CODE_CACHE_ROOTS.lock().unwrap();
*lock_guard.size += slots.len();
lock_guard.hash_map.insert(nm, slots);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MMTkRootScanWorkScope
CodeCache
roots scanningcompute_*_roots
functionsPerformance comparison
Machine: Shrew (Core i9-9900K Coffee Lake, 8/16 cores)
Immix
http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/3k2Ucr
MarkCompact
http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/WjS3eX