-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could there be a use-after-free bug on the atomic shared_ptr implementation? #4
Comments
The trick is hidden at step 3. You cannot simply update control block to new location, you must take a holder (it's basically a guard) before. Holder is created here: https://github.com/vtyulb/AtomicSharedPtr/blob/master/src/atomic_shared_ptr.h#L309 Now we have at least one alive pointer instance. However, it says nothing about global refcount (moreover, FastSharedPtr never increases global refcount contrary to SharedPtr). That's why we can never update our control block, if it has non-zero local refcount. I call it "pumping phase": https://github.com/vtyulb/AtomicSharedPtr/blob/master/src/atomic_shared_ptr.h#L316 When local refcount is pumped out, we can finally exchange our control block here: https://github.com/vtyulb/AtomicSharedPtr/blob/master/src/atomic_shared_ptr.h#L325 So, steps 3 and 4 actually occur in reversed order. And there is no direct store() method, it's always compareExchange() for this implementation. |
Thanks for the question, I just had a lot of fun sorting out the algorithm |
Could something interfere in the middle between "pumping phase" and the exchange control block step? Could you describe it in some kind of pseudo code (without the FastSharedPtr). |
I also created my own way to resolve this bug. |
No, if we didn't succeed then we subtract our counter back: https://github.com/vtyulb/AtomicSharedPtr/blob/master/src/atomic_shared_ptr.h#L320 We try to do actual control block exchange when we know, that it should have zero local refcount now: https://github.com/vtyulb/AtomicSharedPtr/blob/master/src/atomic_shared_ptr.h#L325 read:
compareExchange:
store:
Invariant is:
You should actually try. This project has stress tests, I had a hard time debugging ABA problem on lock-free structures that use my AtomicSharedPtr. Build commands are here: https://github.com/vtyulb/AtomicSharedPtr/tree/master#build You can modify cycle length to bigger values, and enable/disable all sanitizers (but they are quite slow). Project also has native logging. Standard logging won't do, because as it turns out any broken code starts to work when you add mutex /syscall / other lock-free queue. |
Because you increase the global refcount, the try to the local refcount. So this condition added the check for zero local refcount? |
The idea for me is different:
So on the store() thread, the steps should be:
The store() thread and the load() step will exchange information depends on who comes to change the packed global refcount first.
I pretty sure this approach works. |
I did create a TLA+ spec of this algorithm.
I think I will also try to model your approach with TLA+. |
No, that specific condition is needed for ABA-handling. Someone could change control block twice. He must have also pumped local refcount to global, so we can safely decrease global refcount back at line 243. |
I think I agree with approach, it should work. Did I understand correctly, PUMPED flag is not set at start, can be set only once and never reset again? |
That's a great idea! I found out about TLA+ only a year ago, and it would probably saved me some time, lost a lot on debugging random aba problem specific to current algorithm. Folly is similar and different simultaneously. I dived in folly with @timuraudio 3 years ago, my info can be a little bit outdated. At that point we had same packing idea, local and global refcounts. Their global refcount is compatible with std::shared_ptr, but they used some dirty hacks to do that, so they lost compatibility with macos. Folly is production, and AtomicSharedPtr is still just proof-of-concept. Timur described difference at cppcon and some other conferences I think. Now that you mention folly, it's funny that I finally understand what problem you described in your first message. We have 3 different solutions to it:
|
I added this in my TLA+ Spec. So I can check whether the ABA problem can cause problem. When after you make your own TLA+ model of your ideal (or folly's idea), could you send me yours for me to study / check. Thanks you first. For the case whether there's ABA problem when set the PUMPED flag. The question here is could this control block be reused to another AtomicSharedPtr before it's been destroyed Should there be a use case for this? If yes, so my algorithm may be not sufficient again. For the folly lib, how folly can magically decrease the global refcount to zero after all of that? Genuinely curiously though |
I will have new year holiday soon, but I'm not sure if I could use it. No promises here.
You never know. Someone can set dequeue, and use control block in 2 pointers. Someone can reuse control blocks all the time, because it's too slow to allocate memory. Memory allocation is never lock-free, so you should expect anything.
When they create AtomicSharedPtr, they reserve 8192 on global refcount: https://github.com/facebook/folly/blob/main/folly/concurrency/AtomicSharedPtr.h#L246 I can't verify their code, but I trust it works. I believe mine works too 😄 |
I think the implementation here (and possibly many other implementation) has this bug:
For example, assume we have 2 threads:
With following interleaving actions:
ptr
Did I get something wrong somewhere.
Or the algorithm itself is incorrect?
The text was updated successfully, but these errors were encountered: