-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Change TaskDeps to start preallocated with 128 capacity #137563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @Noratrieb rustbot has assigned @Noratrieb. Use |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[Perf experiment] Changed TaskDeps to start with preallocated with 1024 capacity This is a tiny change that makes `TaskDeps::read_set` start preallocated with capacity for 1024 elements(4096 bytes). Somewhat annoyingly, `HashSet::with_capacity` requires the hasher to implement `RandomState`. Since `FxHash` does not implement `RandomState`, I had to use `HashSet::with_capacity_and_hasher`, which required re-exporting `FxBuildHasher` in `rustc_data_structures`. From local profiling, it looks like `TaskDeps::read_set` is one of the most-often resized hash-sets in `rustc`. Local perf runs indicate this is a small perf improvement(without any regressions). There is also no significant RSS increase(the RSS changes are noisy and cancel themselves out almost entirely). Still, since the local and CI results can differ, I'll do a CI perf run before this PR can be reviewed. `@bors` try `@rust-timer` queue
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (0fd98fe): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 0.0%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -0.3%, secondary 0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 770.447s -> 768.842s (-0.21%) |
This seems like a perf improvement to me - since the average is down, and the regressions are much smaller than the improvements. I'll have to look a bit more into the regressions, since, at least at the first glance, some of them seem a bit odd: Eg. |
r? @nnethercote Since this is a micro-optimization, I think you are the right person to review this. What is the general policy for PRs that have both regressions and improvements? Does each regression need to be separately justifed, or do they just need to be outweight by the perf gains? |
What is the distribution of sizes of the set? You can probably add a Drop impl and just eprintln! the len() of the set and then pass that through IOW, is 1024 the right point? Or perhaps 2048, or some other point on the curve -- I could also imagine wanting to jump up (e.g., most sets are 10 elements but then immediately grow to 2048 elements). |
I agree with @Mark-Simulacrum, some |
@@ -1299,7 +1299,7 @@ impl Default for TaskDeps { | |||
#[cfg(debug_assertions)] | |||
node: None, | |||
reads: EdgesVec::new(), | |||
read_set: FxHashSet::default(), | |||
read_set: FxHashSet::with_capacity_and_hasher(1024, FxBuildHasher::default()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_set: FxHashSet::with_capacity_and_hasher(1024, FxBuildHasher::default()), | |
read_set: FxHashSet::with_capacity_and_hasher(1024, Default::default()), |
would work too I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works - thanks for the suggestion.
I did not know about This is the result of counts:
So, it seems like this |
I have done some more testing, and Additonally, I have discovered(by printing the caller on each resize) that ~30% of all resizes of
|
Do you want to do another perf run with the limit set to 128 or even 64? |
02aa397
to
e2a8bc0
Compare
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[Perf experiment] Changed TaskDeps to start with preallocated with 1024 capacity This is a tiny change that makes `TaskDeps::read_set` start preallocated with capacity for 1024 elements(4096 bytes). Somewhat annoyingly, `HashSet::with_capacity` requires the hasher to implement `RandomState`. Since `FxHash` does not implement `RandomState`, I had to use `HashSet::with_capacity_and_hasher`, which required re-exporting `FxBuildHasher` in `rustc_data_structures`. From local profiling, it looks like `TaskDeps::read_set` is one of the most-often resized hash-sets in `rustc`. Local perf runs indicate this is a small perf improvement(without any regressions). There is also no significant RSS increase(the RSS changes are noisy and cancel themselves out almost entirely). Still, since the local and CI results can differ, I'll do a CI perf run before this PR can be reviewed. `@bors` try `@rust-timer` queue
☀️ Try build successful - checks-actions |
@rust-timer build a21e95e |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (a21e95e): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (secondary 1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -2.1%, secondary 7.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 770.531s -> 770.268s (-0.03%) |
I think this is ready to merge - since it is a perf imporvement, and a 1-line change. |
@bors r+ |
🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened. |
…rcote Change TaskDeps to start preallocated with 128 capacity This is a tiny change that makes `TaskDeps::read_set` start preallocated with capacity for 128 elements. From local profiling, it looks like `TaskDeps::read_set` is one of the most-often resized hash-sets in `rustc`.
☀️ Test successful - checks-actions |
A job failed! Check out the build log: (web) (plain) Click to see the possible cause of the failure (guessed by this bot)
|
Finished benchmarking commit (4f52199): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 2.7%, secondary -1.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 765.189s -> 765.417s (0.03%) |
This is a tiny change that makes
TaskDeps::read_set
start preallocated with capacity for 128 elements.From local profiling, it looks like
TaskDeps::read_set
is one of the most-often resized hash-sets inrustc
.