-
Notifications
You must be signed in to change notification settings - Fork 26
On a wip branch, there are some subtle soundness issues with Lazy #30
Comments
Ahhhhhh yesss! My secret plan to have someone review this code worked. :D Could I convince you to review
Yeah, I've read your blog post a few times and I find it pretty compelling. Buuuuuuuuuuut, I don't really know what else to do. For With that said, I'm also using a spinlock in Now, in regex-automata, there is an escape hatch for all of this. You can choose to use a lower level API that lets you pass your own mutable memory, which means the regex doesn't need to talk to a pool at all. But, I had also been hoping to get to the section on spinlocks in @m-ou-se's book (pinging you on the off chance you want to chime in here) to see if there was any extra insight to be gathered there. I am traveling tomorrow, so I might actually get that chance! Mara: big picture thing here is that, at some point in the next few months, the |
@matklad Also, I've pushed up fixes to your feedback. Thank you. I've also added big warnings to |
This is the current code that implements the spinlock mentioned above for a regex-automata/src/util/pool.rs Lines 729 to 815 in 392b156
|
If that Vec is just used as a stack, to which you only push and pop only one element at a time that will only be used by one thread, it might be simple to do that lock free. :) (Lock free lists are usually quite tricky, but I think your use case might be one that doesn't run into most of the common tricky issues.) |
My general stance on spinlocks is driven primarily by aesthetic reasons: it's no that I debugged some curious latency spikes in a server, it's just that I understand the mechanics of how spin locks could lead to issues, and that shifts me to "it's not up to the library to make this choice". Maybe in practice it is actually fine! But maybe not: here's someone hitting an actual deadlock in the crossbeam. I guess that issue can be used to create a litmus test here:
That does seem quite unlikely, but also is not impossible, and the result (deadlock) is almost as bad as it can go. Is it ok for a library to potentially open the used to this scenario? On the aesthetic ground definitely not! On practical grounds, it might actually be yes? One thing I've realized here is that no_std+alloc is in some sense an incoherent world. If you have a global allocator, you must have a proper implementation of Mutex to protect it (that impl can be a no-op for threadless environments). This is somewhat ironic in this situation --- you are building a globabl memory pool, a specialized version of a global allocator, you need the same kinds of tools to build that, the tools are actually in your address space (as they are used by the global allocator), but you can't reach them! I think for pool what Mara suggests makes the most sense. Lock-free pool + racy lazy should be totally fine. I am not sure what to do with no_std no_alloc lazy... I see that it is used to intialized an &[u32]. As the initialization is deterministic, I am wondering if it would be correct to do something like: if !initialized.load(Acquire) {
for byte in bytes.iter() { byte.store(deterministic_value, Relaxed) }
initialized.store(true, Release)
}
for byte in bytes.iter() {
byte.load(NonAtomic);
} That is, threads race to overwrite the byte array, but, because they are guaranteed to write the same value, it is actually ok? My reading of C++ memory model tells me that no, this won't be ok. On the high level, I believe the memory model doesn't have anything specific about "same value", so reasoning relying on that would we wrong. More specifically, as we ultimately want to do non-atomic reads, then our atomic relaxed writes do nothing. And, if I remove that relaxed store, it seems clear that this is UB --- you have non-atomic read and write racing with each other, and that's definitely UB, even if the write overwites the same value. Curiously though, if you also make sure that reads are atomic relaxed, that should be fine! (but then you'd have to return a pub static {name}: Lazy<DFA<&'static [u32]>> = Lazy::new(|| {{
{deserialize}
}}); I'd maybe write this as pub fn {name}() DFA<&'static [u32]> {
static LAZY = Lazy<DFA<&'static [u32]>> = Lazy::new(|| {{
{deserialize}
}});
&*LAZY
} to not expose specific synchronization in the API. As to what to do here in general, I think there are these choices:
|
Here's a lock-free-ish pool that I think suits your needs: https://gist.github.com/m-ou-se/5fdcbdf7dcf4585199ce2de697f367a4 It does still (very briefly) lock the stack when popping an element, but it does not lock to push an element, nor does it keep anything locked while allocating and creating a new element. |
@m-ou-se thank you! The reason why i hadn't used the approach of a lock free structure was indeed because of the ABA problem. It looks like you get around that by still using a spin lock. Does that still fall afoul of the problems with spin locks in general though? (Yours has much smaller scope than mine, of course.) |
A spinlock isn't generally bad. A regular |
(But as usual, only a benchmark will provide real answers. ^^) |
(I would say "by using spinning", not "by using a spin lock". A |
Hmmm interesting. I guess what I had in mind is whether @matklad's deadlock scenario can happen in your pool. @matklad Thinking about it more, why doesn't the |
A spin loop hint doesn't communicate with the operating system. It's basically just a powerful "200 × (The operating system equivalent is |
Yeah, I believe it could happen if:
https://github.com/matklad/spin-of-death/blob/master/src/main.rs (from my post) might be a good starting point to create the above situation.
I think it's important to make a distinction between "spinning a bit before syscall" (awesome perf optimization!) and an actual spinlock, where we burn the CPU, blocked on the progress of some other thread (can create a deadlock if thread priorities are unfair). |
That deadlock situation is one that could happen when a thread gets preempted after it has locked but before it has unlocked the list, the newly running thread starts waiting for the list to be unlocked, and the scheduler will never preempt the second thread to let the first one make progress. This won't happen with purely cooperative scheduling (only switching threads on yield/syscall), since there is no syscall or any user code between the lock and unlock. (Unlike with a This is avoided by putting a maximum on the number of spin cycles and falling back to something else. If you fall back to blocking using the operating system, you've implemented a mutex. But in this case you could also just fall back to not using the pool and allocating/deallocating directly. Note that using malloc/Box::new in a signal handler on Unix is actually not allowed, because it can just result in the exact same deadlock with the lock(s) in the allocator implementation, so that doesn't really solve anything for this case. In other words, for Unix signal handlers, not using the So what's left is (non-preempted/"real time") threads with different priorities, and custom/embedded systems that do not use libc but have their own allocator that is safe to be used in such situations. In those cases, the To solve that issue we need to interact with the operating system in some way, which is basically what we're trying to avoid in |
Note that |
Got this to deadlock without signals, just using realtime thread priorities: But yeah, it indeed seems like |
Aye, thanks for this everyone! I think I'm just going to swap out my alloc-only implementation of |
Just couldn't resist taking a look at a couple of safety issues that have burnt me in the past in this area :-)
This type
regex-automata/src/util/lazy.rs
Line 110 in 1d8f4d6
has two problems:
First, it misses
unsafe impl<T: Sync + Send, F: Sync> Sync for Lazy<T, F>
(the inferred impl misses + Send). This can be weaponized to turn Lazy into a channel for sending Sync+!Send types across threads:Second, the type owns
T
(it can drop it), but this isn't visible to compiler. That is,PhantomData<Box<T>>
is missing. In principle, this should be weaponizable by subverting dropcheck. If the compiler believes that droppingLazy
can't dropT
, then it might allow code where the T in lazy has some dangling pointers when the Lazy is dropped, something along the lines ofNo, this exact code compiles, because, although compiler doesn't see that we can drop
T
, it sees that we can dropF
, and those share the problematic lifetime. I am maybe 70% confident that, because of this, there's no way too weaponize it, but still adding phantom data would seem like a good idea.(newer versions of OnceCell crate also have this version of synchronization primitive: https://docs.rs/once_cell/latest/once_cell/race/struct.OnceBox.html. We don't have a spnning no-alloc version though, as I somewhat firmly believe that, for all code that could spin lock wihtout yielding to OS/blocking interrupts/etc, the caller should explicitly type out
.spin
somewhere)The text was updated successfully, but these errors were encountered: