-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Make Rc<T>::deref
and Arc<T>::deref
zero-cost
#132553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b283c44
to
ae36f44
Compare
This comment has been minimized.
This comment has been minimized.
Would it potentially enable those types to have an ffi compatible ABI? So that they could be returned and passed directly from /to ffi function, like |
This comment has been minimized.
This comment has been minimized.
I think in theory it is possible, at least for sized types, but I am not familiar with how to formally make it so. |
ae36f44
to
0d6165f
Compare
This comment has been minimized.
This comment has been minimized.
0d6165f
to
98edd5b
Compare
This comment has been minimized.
This comment has been minimized.
r? libs |
98edd5b
to
8beb51d
Compare
This comment has been minimized.
This comment has been minimized.
8beb51d
to
d7879fa
Compare
This comment has been minimized.
This comment has been minimized.
d7879fa
to
317aa0e
Compare
@EFanZh Is this ready for review? If so, please un-draft the PR. |
@joboet: The source code part is mostly done, but I haven’t finished updating LLDB and CDB pretty printers. The CI doesn’t seem to run those tests. |
No worries! I just didn't want to keep you waiting in case you had forgotten to change the state. |
f243654
to
1308bf6
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Make `Rc<T>::deref` and `Arc<T>::deref` zero-cost
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (38a5d4f): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.0%, secondary -0.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 5.1%, secondary -0.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.5%, secondary 0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 463.478s -> 463.036s (-0.10%) |
The results here seem to be pretty mixed. Any idea why this seems to hurt some incremental builds? I suppose it could cause more work to do at build time, but the 14% increase building clif seems like something more substantial. Unless, is that code super ( |
@tgross35: There isn’t any significant change since the last perf run (https://perf.rust-lang.org/compare.html?start=b88076097751f7677b850b94b20faf5679fca321&end=1a76f3df0b6373e760df2514a5af2587f3e01aff&stat=instructions:u). But my local development environment is currently broken, and I’ll need some time to analyze the perf result. |
Oh yeah that the wins definitely outweigh the losses here, no disagreement. I’m just wondering what makes clif such an outlier. The cargo changes have to be inlining, maybe this just tips the scale for something commonly-called medium-sized function to be inlined. |
d54c877
to
3a7fd56
Compare
This PR was rebased onto a different master commit! Check out the changes with our |
3a7fd56
to
fc33db3
Compare
pub(crate) fn new_array<T>(length: usize) -> Self { | ||
#[inline] | ||
fn inner(value_layout: Layout, length: usize) -> RcLayout { | ||
// We can use `repeat_packet` here because the outer function passes `T::LAYOUT` as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We can use `repeat_packet` here because the outer function passes `T::LAYOUT` as the | |
// We can use `repeat_packed` here because the outer function passes `T::LAYOUT` as the |
@tgross35: I have marked some functions |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Make `Rc<T>::deref` and `Arc<T>::deref` zero-cost
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (585da62): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.1%, secondary 0.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.6%, secondary -1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.3%, secondary -0.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 469.963s -> 472.368s (0.51%) |
Currently,
Rc<T>
andArc<T>
store pointers toRcInner<T>
andArcInner<T>
. This PR changes the pointers so that they point toT
directly instead.This is based on the assumption that we access the
T
value more frequently than accessing reference counts. With this change, accessing the data can be done without offsetting pointers fromRcInner<T>
andArcInner<T>
to their contained data. This change might also enables some possibly useful future optimizations, such as:&[Rc<T>]
into&[&T]
within O(1) time.&[Rc<T>]
intoVec<&T>
utilizingmemcpy
.&Option<Rc<T>>
intoOption<&T>
without branching.Rc<T>
andArc<T>
FFI compatible types whereT: Sized
.