Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of triomphe::Arc is sometimes worse than std::sync::Arc #74

Open
orium opened this issue Nov 2, 2023 · 9 comments
Open

Comments

@orium
Copy link

orium commented Nov 2, 2023

Hi,

@michaelsproul and I ran some benchmarks in rpds . Most results improved when switching from std::sync::Arc to triomphe::Arc, but I was surprised that some benchmarks regressed significantly (> 30%). I'm wondering if there are optimizations on std::sync::Arc that were never ported to triomphe and that could improve triomphe and all projects that use it.

This is the PR with the benchmark results: orium/rpds#88

@Manishearth
Copy link
Owner

No idea; this crate was written quite a while ago.

@orium
Copy link
Author

orium commented Nov 3, 2023

If I take a look at the std::sync::Arc and port any potential optimizations to triomphe will you be available to review and merge them? (Not promising anything as I am busier that usual lately.)

@michaelsproul
Copy link

I was browsing std::sync::Arc's recent changes and this one stood out as something we could maybe backport: rust-lang/rust#115546. Although I'm not familiar enough with the memory model to know if this would make a big difference.

@michaelsproul
Copy link

This comment suggests is_unique relies on the Acquire semantics:

triomphe/src/arc.rs

Lines 546 to 549 in f78899c

// See the extensive discussion in [1] for why this needs to be Acquire.
//
// [1] https://github.com/servo/servo/issues/21186
Self::count(self) == 1

@Manishearth
Copy link
Owner

I would be happy to get patches, I can't guarantee I'll be able to review and merge them in a timely manner but if they map closely to upstream code that would make it easier. Would also love help reviewing them

@orium orium changed the title Performance of triomphe::Arc is sometime worse than std::sync::Arc Performance of triomphe::Arc is sometimes worse than std::sync::Arc Nov 9, 2023
@Dherse
Copy link

Dherse commented Nov 28, 2023

I concur, I am working on Typst (as a contributor not owner/creator) and specifically on performance, we don't use weak refs to I figured swapping out std arcs for triomphe would help but performance is around 30% worse using triomphe so there is something here, perhaps an optimization that exists in the std lib and not here? As someone said, they have (apparently) relaxed the ordering in the std lib, perhaps we could do the same in Triomphe?
Thanks, Dherse

@jmspiewak
Copy link

The two differences I can see are:

  1. In make_mut std uses Acquire only when the ref count is 1, triomphe always acquires. This shouldn't matter on x86.
  2. Triomphe does an acquire load in drop, std an acquire fence. This is very minor.

Are you benchmarking heavily contended accesses or mostly single threaded?

@michaelsproul
Copy link

@jmspiewak The rpds benchmarks are mostly single-threaded

@GnomedDev
Copy link
Contributor

Something to keep in mind: although orderings don't lead to different ASM on x86 in isolation, LLVM will still use them when optimising when functions are inlined, so weaker orderings can allow LLVM to reorder and generate better ASM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants