-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8338534: GenShen: Handle alloc failure differently when immediate garbage is pending #479
base: master
Are you sure you want to change the base?
8338534: GenShen: Handle alloc failure differently when immediate garbage is pending #479
Conversation
This reverts commit 7bb1d38.
When we round up, we introduce the risk that the new size exceeds the maximum LAB size, resulting in an assertion error.
This reverts commit 99cce53.
This reverts commit d881300.
This may allow us to reclaim immediate garbage more quickly.
We just reported freeset status after rebuilding freeset in final mark. There will be heavy contention on the heaplock at the start of evacuation as many mutator and worker threads prep their GCLAB and PLAB buffers for evacuation. We avoid some lock contention by not reporting freset status here.
If we notify waiting mutators after immediate garbage is reclaimed, do not clear the alloc-failure flag. Otherwise, this hides the fact that alloc failure occurred during this GC.
👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
BTW, I can't figure out what jcheck whitespace is complaining about. I think it is reporting the wrong line number. jcheck on my local copy does not report a whitespace problem. |
Yes, for some reason However, there is indeed whitespace (in fact, 2 as in the error message) in that file on an otherwise blank line. However, the line number is indeed incorrect as you stated. It should be line 2467.
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several changes are implemented here:
- Re-order the phases that execute immediately after final-mark so that we do concurrent-cleanup quicker (but still after concurrent weak references)
- After immediate garbage has been reclaimed by concurrent cleanup, notify waiting allocators
- If an allocation failure occurs while immediate garbage recycling is pending, stall the allocation but do not cancel the concurrent gc.
As you suggested offline, I agree that it might make sense to do (1) separately, and then do (2+3).
Left a few comments mainly on (2+3), but I'm still missing the solution to the problem described in the ticket. I'll chat with you offline to get clarity.
@@ -71,6 +73,8 @@ class ShenandoahController: public ConcurrentGCThread { | |||
// until another cycle runs and clears the alloc failure gc flag. | |||
void handle_alloc_failure(ShenandoahAllocRequest& req, bool block); | |||
|
|||
void anticipate_immediate_garbage(size_t anticipated_immediate_garbage_words); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a 1-line documentation comment on the role of the field (and that the method sets it -- why not simply call it set_foo(value)
for field _foo
? I realize you want readers to get the most recently written value, hence the atomic store & load.
if (clear_alloc_failure) { | ||
_alloc_failure_gc.unset(); | ||
_humongous_alloc_failure_gc.unset(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For good hygiene, I'd move the variable value changes into the monitor which is held when waiting or notifying. I realize this doesn't matter for correctness, but makes debugging easier.
Further, if you protect the updates and reads of the variables with the lock, you don't need to do the extra atomic ops.
You'd need to examine all sets/gets and waits/notifys to make sure this works, but I am guessing it will, and it'll also improve performance.
However, that can be done in a separate effort, if you prefer, for which I'm happy to file a separate ticket for that investigation/change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize now that this idiom is quite pervasive in Shenandoah code, so just fixing this instance of it doesn't accomplish much at this time. I am not convinced it's a good idiom. I'll investigate this separately. I vaguely recall a discussion along these lines in an older PR.
I'll file a separate ticket for this; you can ignore this remark for the purposes of this PR.
ShenandoahHeap::heap()->free_set()->recycle_trash(); | ||
ShenandoahHeap* heap = ShenandoahHeap::heap(); | ||
if (heap->free_set()->recycle_trash()) { | ||
heap->control_thread()->notify_alloc_failure_waiters(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you motivate this notification? As far as I can tell, all waiters will react to the notification by waking up, finding that the variables are still set, and consequently go back to wait.
I am sure I am missing something here, or you didn't make an intended change to allow waiters to retry allocation after waking up and go back to sleep if they didn't succeed?
A documentation comment would definitely help cross the t's and dot the i's for the reader.
@@ -2462,6 +2462,9 @@ void ShenandoahHeap::rebuild_free_set(bool concurrent) { | |||
size_t young_cset_regions, old_cset_regions; | |||
size_t first_old_region, last_old_region, old_region_count; | |||
_free_set->prepare_to_rebuild(young_cset_regions, old_cset_regions, first_old_region, last_old_region, old_region_count); | |||
size_t anticipated_immediate_garbage = (old_cset_regions + young_cset_regions) * ShenandoahHeapRegion::region_size_words(); | |||
control_thread()->anticipate_immediate_garbage(anticipated_immediate_garbage); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the line that has two whitespaces, vide the jcheck whitespace error above.
} | ||
|
||
void ShenandoahFreeSet::recycle_trash() { | ||
bool ShenandoahFreeSet::recycle_trash() { | ||
bool result = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take the opportunity to do some counting verification here.
int n_trash_regions = 0;
Read on ...
_heap->control_thread()->anticipate_immediate_garbage((size_t) 0); | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
clear_anticipated_immediate_garage(n_trash_regions*HeapRegionSize);
return n_trash_regions > 0;
with
void ...::clear_anticipated_immediate_garbage(size_t aig) {
assert(_anticipated_immediate_garbage == aig, "Mismatch?");
_anticipated_immediate_garbage = 0;
}
Is this an intended invariant? I think it is, but don't understand enough of the details to be certain.
@@ -53,6 +53,10 @@ size_t ShenandoahController::get_gc_id() { | |||
return Atomic::load(&_gc_id); | |||
} | |||
|
|||
void ShenandoahController::anticipate_immediate_garbage(size_t anticipated_immediate_garbage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested rename see further above. set_<field>
.
bool ShenandoahFreeSet::try_recycle_trashed(ShenandoahHeapRegion* r) { | ||
bool result = false; | ||
if (r->is_trash()) { | ||
r->recycle(); | ||
result = true; | ||
} | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood your intent, I think this has a bug because it always returns true here. I believe you just wanted:
if (r->is_trash()) {
r->recycle();
return true;
}
return false;
@@ -753,6 +753,9 @@ void ShenandoahGeneration::prepare_regions_and_collection_set(bool concurrent) { | |||
// We are preparing for evacuation. At this time, we ignore cset region tallies. | |||
size_t first_old, last_old, num_old; | |||
heap->free_set()->prepare_to_rebuild(young_cset_regions, old_cset_regions, first_old, last_old, num_old); | |||
size_t anticipated_immediate_garbage = (old_cset_regions + young_cset_regions) * ShenandoahHeapRegion::region_size_words(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it sound like old_cset_regions & young_cset_regions hold all regions that will be part of immediate garbage
-- which indeed is the case. The name prepare_to_rebuild()
does not give one much clue as to the fact that that's happening when we return from the call, and the API spec of the method does not explicitly specify it. One needs to read into the code of the method find_regions_with_alloc_capacity()
to realize that this is what is happening.
In summary, firstly, I feel some of these methods are in need of tighter header file documentation of post-conditions that callers are relying on and, secondly, I feel given the extremely fat APIs (lots of reference variables that are modified by these methods) that some amount of refactoring is needed in the longer term. The refactoring should be a separate effort, but in the shorter term I think the API/spec documentation of prepare_to_rebuild
and find_regions_with_alloc_capacity
could be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names old_cset_regions
and young_cset_regions
is very confusing as well. These regions are already trash before the collection set is chosen during final mark (and so, will not themselves be part of the collection set). Suggest calling them old_trashed_regions
and young_trashed_regions
here.
@@ -474,6 +474,9 @@ void ShenandoahOldGeneration::prepare_regions_and_collection_set(bool concurrent | |||
size_t cset_young_regions, cset_old_regions; | |||
size_t first_old, last_old, num_old; | |||
heap->free_set()->prepare_to_rebuild(cset_young_regions, cset_old_regions, first_old, last_old, num_old); | |||
size_t anticipated_immediate_garbage = (cset_young_regions + cset_old_regions) * ShenandoahHeapRegion::region_size_words(); | |||
heap->control_thread()->anticipate_immediate_garbage(anticipated_immediate_garbage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest set_<foo>
for changing field value <foo>
.
@@ -753,6 +753,9 @@ void ShenandoahGeneration::prepare_regions_and_collection_set(bool concurrent) { | |||
// We are preparing for evacuation. At this time, we ignore cset region tallies. | |||
size_t first_old, last_old, num_old; | |||
heap->free_set()->prepare_to_rebuild(young_cset_regions, old_cset_regions, first_old, last_old, num_old); | |||
size_t anticipated_immediate_garbage = (old_cset_regions + young_cset_regions) * ShenandoahHeapRegion::region_size_words(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names old_cset_regions
and young_cset_regions
is very confusing as well. These regions are already trash before the collection set is chosen during final mark (and so, will not themselves be part of the collection set). Suggest calling them old_trashed_regions
and young_trashed_regions
here.
@@ -65,11 +69,12 @@ void ShenandoahController::handle_alloc_failure(ShenandoahAllocRequest& req, boo | |||
req.type_string(), | |||
byte_size_in_proper_unit(req.size() * HeapWordSize), proper_unit_for_byte_size(req.size() * HeapWordSize)); | |||
|
|||
// Now that alloc failure GC is scheduled, we can abort everything else | |||
heap->cancel_gc(GCCause::_allocation_failure); | |||
if (Atomic::load(&_anticipated_immediate_garbage) < req.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand... here we are saying that if final mark anticipates this much immediate garbage (computed when it rebuilt the freeset after choosing the collection set), then we aren't going to cancel the GC if this particular request could be satisfied. Instead we will block as though the gc has already been cancelled. This thread will be notified when concurrent cleanup completes.
_alloc_failure_gc.unset(); | ||
_humongous_alloc_failure_gc.unset(); | ||
void ShenandoahController::notify_alloc_failure_waiters(bool clear_alloc_failure) { | ||
if (clear_alloc_failure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we not clear the alloc failure? This seems like it would confuse the control thread. Isn't this going to have the control thread attempt to notify alloc failure waiters again when the cycle is finished?
@kdnilsen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@kdnilsen : Should this become a draft for now? |
@kdnilsen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
Several changes are implemented here:
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/shenandoah.git pull/479/head:pull/479
$ git checkout pull/479
Update a local copy of the PR:
$ git checkout pull/479
$ git pull https://git.openjdk.org/shenandoah.git pull/479/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 479
View PR using the GUI difftool:
$ git pr show -t 479
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/shenandoah/pull/479.diff
Webrev
Link to Webrev Comment