-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
runtime: eliminate unnecessary lfence while operating on queue::Local<T>
#7340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cea005b
to
ad6f9ea
Compare
I just merged #7339, so merging master should fix the ci issue 👍 |
pub(crate) fn remaining_slots(&self) -> usize { | ||
self.inner.remaining_slots() | ||
let (steal, _) = unpack(self.inner.head.load(Acquire)); | ||
// safety: this is the **only** thread that updates this cell. | ||
let tail = unsafe { self.inner.tail.unsync_load() }; | ||
|
||
LOCAL_QUEUE_CAPACITY - len(steal, tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just LOCAL_QUEUE_CAPACITY - self.len()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just
LOCAL_QUEUE_CAPACITY - self.len()
?
@Darksonn Thanks for your review, I might be mistaken, but I thought the slots that are being stolen are not available for pushing new task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see now. I did not notice that it was using steal
instead of head
for the first parameter to len
.
Blocked by #7339 which fixes the CI failures caused by the new compilation error message of the new stable toolchain.Motivation
queue::Local<T>
is never been send to another thread (except while shutting down), thetail
is only updated by the worker thread who owned this local queue, so any change totail
is immediately visible to the current thread.This means the
Acquire
ordering is not required for loadingtail
while operating on the local queue.Solution
Unsync load
Since the lfence is not required for this scenario, the atomic loading was replaced by simple pointer deref.
Remove all methods of the
queue::Inner<T>
Because of the following reasons:
Inner<T>
can be implemented in differently way forLocal<T>
andSteal<T>
len_local
orunsync_len
creates possibilities for incorrect usage in the future which causes ABA problems in critical path.I removed all methods on the
Inner<T>
and re-implement it for bothLocal<T>
andSteal<T>
.Remove the feature gate of
Steal<T>::len()
The feature gate
cfg_unstable_metrics
is removed forSteal<T>::len()
to avoid duplicating the same code.This doesn't seem to break anything, but if it does please let me know and I'll switch to something else.
Anything else?
Ideally, the
queue::Local<T>
should be marked as!Send
to lower the risk that unintentionally send it to another thread, but it required the changes of the starting and shutdown logic, I will try to open another PR for this improvement.