-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Speed up list operations that use amortized_iter() #20964
base: main
Are you sure you want to change the base?
Conversation
|
I suspect switching to ( |
Can you remove the race condition in match Arc::get_mut(&mut self.0) {
Some(r) => r,
None => {
self.0 = self.0.clone_inner();
unsafe { Arc::get_mut_unchecked(&mut self.0) }
}
} |
Oh, |
We can feature gate it, and use that if we are on the nightly compiler. This is always the case for our Python releases. |
The proposed change won't pass the borrow checker, I think. And the really expensive atomic operations are the ones in After reading in more detail it seems like the only way to avoid expensive load-update operation is to switch to something like triomphe::Arc that lacks weakrefs. Then you're doing an Or there may be some way to rearchitect or tweak this abstraction at some higher level of the code. |
Oh, I thought no weakrefs were used but further digging suggests they are. So triomphe won't work. |
Thinking about this some more, the race condition won't result in panic, it will result in extra cloning. And that suggests using |
Ugh... This is a classic case that Polonius would solve, hope we get something similar soon. |
So after writing this whole comment... if my reasoning is correct, this optimization can actually go into So I suggest keeping this PR as is, and I'll open a PR against Rust, and see what they say, and if it gets merged then Polars will just get faster via upgrading to a newer Rust. |
Actually, we're OK with uniqueness giving the wrong answer sometimes, so long as it's rare and gives a false negative. Cause worse case we do extra work. Stdlib won't be OK with that. So just gonna update this PR. |
This takes speed to as fast as 650µs per run, so even faster. But someone needs to think very hard about whether my justification is correct. |
Another small bottleneck I haven't fixed:
|
Woke up in the middle of the night and realized the unsafe is unsound. Will remove later, just wanted to note that so it doesn't get merged. |
Filed #21004 as explanation of why the proposed change would've been unsound. It's not actually a problem in current codebase, but as soon as someone used a weakref it would've been. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #20964 +/- ##
==========================================
- Coverage 79.34% 79.21% -0.13%
==========================================
Files 1579 1583 +4
Lines 224319 225085 +766
Branches 2573 2581 +8
==========================================
+ Hits 177976 178301 +325
- Misses 45755 46194 +439
- Partials 588 590 +2 ☔ View full report in Codecov by Sentry. |
I expect this will speed up most list/array operations that rely on
amortized_iter()
(except in cases where the actual operation has significant issues, e.g. #19106; won't hurt there, though).Before the change, the new benchmark has a mean of around ~780µs.
After the change to
iterator.rs
, the benchmark has a mean of ~710µs.After additional optimization to
Series::_get_inner_mut()
, mean is as low as ~650µs, but I then reverted this.Why it helps: Typically expensive (atomic, in this case) operations aren't a problem in normal Series usage, because we call them rarely. But this is a different situation because we call them a lot.