-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small optimizations on iggen buffer handling #317
base: master
Are you sure you want to change the base?
Conversation
e59eacf
to
9785173
Compare
9785173
to
0bd705c
Compare
This also avoids an unordered_map by transposing the perform_task_buffer_accesses loop.
0bd705c
to
a46a214
Compare
Check-perf-impact results: (c8fb992b35322012b54e351345fdf71a) ✔️ No significant performance change in the microbenchmark set. You are good to go! Relative execution time per category: (mean of relative medians)
|
Pull Request Test Coverage Report for Build 12158729604Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done.
LGTM! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I've suggested two comment changes that I've added for my understanding while investigating how to implement replicated writes!
} | ||
} | ||
|
||
// 3. Clear tracking structures for all regions that are being written to. We gracefully handle overlapping writes by treating the set of all conflicting | ||
// writers as last writers of an allocation. | ||
// 3. To gracefully handle overlapping writes, clear tracking structures for all regions that are being written to, so we can treat the set of all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// 3. To gracefully handle overlapping writes, clear tracking structures for all regions that are being written to, so we can treat the set of all | |
// 3. To gracefully handle overlapping writes if detection is disabled via the error policy, clear tracking structures for all regions that are being written to, so we can treat the set of all |
} | ||
buffer.track_original_write(concurrent_writes[i], command_instructions[i], concurrent_chunks[i].memory_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer.track_original_write(concurrent_writes[i], command_instructions[i], concurrent_chunks[i].memory_id); | |
// On the buffer level, there is no special handling of overlapping writes: The last chunk that writes to a given | |
// region becomes its last writer, and subsequent reads on other devices will require a copy. | |
buffer.track_original_write(concurrent_writes[i], command_instructions[i], concurrent_chunks[i].memory_id); |
perform_task_buffer_accesses
updates last-writers twice to gracefully handle overlapping writes, which is an edge case. This PR quickly checks if overlapping writes are present, and sticks to a single update if there are not. By transposing the loop nest fromchunk -> bid
tobid -> chunk
, we can also save avoid constructing anotherunordered_map
.Results are not looking too impressive in the benchmark report, but I do get a consistent 4% speedup for RSim
room_small
, which is scheduler bound on gpuc3.