Fix races around pthread exit and join #409

yamt · 2023-04-14T12:44:41Z

No description provided.

alexcrichton

There are other locations such as here which alias __tl_{lock,unlock} to a dummy function. (and a few others as well). Should those be removed?

Another location I see is this one which by avoiding __tl_sync it's not properly synchronizing with __pthread_exit because as soon as this line is executed then the thread is eligible for freeing which would be a use-after free with later self->foo writes (or stack writes).

Additionally I'll still point out that despite the example in #405 not using detached threads I don't think that free is sufficient to deallocate a thread's stack. There's no guarantee that the stack isn't in use at the end of free or at the end of __pthread_exit when everything is cleaned up, meaning it could still be a use-after-free.

It seems like wasi-libc was originally patched to not use threads, but when re-adding threads support it seems like it might be worthwhile to start from scratch or otherwise comprehensively review all threading-related patches, although I'm not sure if that's possible.

alexcrichton · 2023-04-14T15:38:11Z

libc-top-half/musl/src/thread/wasm32/wasi_thread_start.s

+	# Unlock thread list. (as CLONE_CHILD_CLEARTID would do for Linux)
+	#
+	# Note: once we unlock the thread list, our "map_base" can be freed
+	# by a joining thread. It's safe as we are in ASM and no longer use
+	# our C stack or pthread_t.
+	i32.const   __thread_list_lock
+	i32.const   0
+	i32.atomic.store 0
+	# As an optimization, we can check tl_lock_waiters here.
+	# But for now, simply wake up unconditionally as
+	# CLONE_CHILD_CLEARTID does.
+	i32.const   __thread_list_lock
+	i32.const   1
+	memory.atomic.notify 0
+	drop
+


Reading over the definition of __tl_unlock it additionally decrements tl_lock_count. It might be good to assert that is zero in C before returning here to perform this. Additionally it might be good to leave a comment in __tl_unlock that this needs updating if the implementation changes.

Why not just call __tl_unlock?

Reading over the definition of __tl_unlock it additionally decrements tl_lock_count. It might be good to assert that is zero in C before returning here to perform this. Additionally it might be good to leave a comment in __tl_unlock that this needs updating if the implementation changes.

i feel it belongs to upstream musl, not here.

Why not just call __tl_unlock?

because __tl_unlock is a C function, which potentially requires C stack.

I'm not sure what you mean by it belongs in upstream musl? Wasi-libc is already patching musl a fair bit, so what I'm saying is that as part of the patching the cases where __pthread_exit returns should all assert that the tl_lock_count is zero.

I'm not sure what you mean by it belongs in upstream musl? Wasi-libc is already patching musl a fair bit, so what I'm saying is that as part of the patching the cases where __pthread_exit returns should all assert that the tl_lock_count is zero.

i meant tl_lock_count should be zero when a thread exits is not wasi-specific.
if you feel strongly i can add such an assertion.

yamt · 2023-04-14T23:47:19Z

There are other locations such as here which alias __tl_{lock,unlock} to a dummy function. (and a few others as well). Should those be removed?

Another location I see is this one which by avoiding __tl_sync it's not properly synchronizing with __pthread_exit because as soon as this line is executed then the thread is eligible for freeing which would be a use-after free with later self->foo writes (or stack writes).

my understanding is that these weak symbols are resolved to the real __tl_lock/unlock
when pthread_create.o is linked.

Additionally I'll still point out that despite the example in #405 not using detached threads I don't think that free is sufficient to deallocate a thread's stack. There's no guarantee that the stack isn't in use at the end of free or at the end of __pthread_exit when everything is cleaned up, meaning it could still be a use-after-free.

i agree it's broken. i have a few ideas to fix it. i intentionally didn't include a fix in this PR because it's a separate issue.

sbc100 · 2023-04-15T00:48:10Z

libc-top-half/musl/src/thread/pthread_create.c

+	 * See also set_tid_address(2)
+	 *
+	 * In WebAssembly, we leave it to wasi_thread_start instead.
+	 */


Could we instead call __tl_unlock right before a_store(&self->detach_state, DT_EXITED); like we do right be free(self->map_base);?

In the first case we are unlocking right before notifying the joiner that they can call free and in the later case we are unlocking right before we call free ourselves, so it seem symmetrical.

(I'm suggesting this so that we can avoid re-implementing __tl_unlock in asm if we can).

Actually, what was wrong with the single __tl_unlock on line 172 that we had previously?

Could we instead call __tl_unlock right before a_store(&self->detach_state, DT_EXITED); like we do right be free(self->map_base);?

In the first case we are unlocking right before notifying the joiner that they can call free and in the later case we are unlocking right before we call free ourselves, so it seem symmetrical.

the detached case is broken as alex said.

detached threads do never have joiners.

(I'm suggesting this so that we can avoid re-implementing __tl_unlock in asm if we can).

this PR doesn't re-implement __tl_unlock.
it emulates CLONE_CHILD_CLEARTID, which the __tl_lock/unlock/sync protocol relies on.

Actually, what was wrong with the single __tl_unlock on line 172 that we had previously?

double unlock.

it unlocks too early and allows joiner to free our stack before we finish on it.

I agree the double unlock thing looks like a bug.

But the joiner is not waiting on the tl_lock is it? The joiner seems to be waiting on t->detach_state. In fact, I don't see the tl_lock referenced at all in pthread_join.c. Maybe I'm missing something?

the joiner uses __td_sync to sync with the exit.

sbc100 · 2023-04-15T01:08:00Z

I wonder if we should instead come up with a different solution that doesn't involve a thread free'ing its own stack. How about this:

Reserve to first word of each stack to mean "ready for collection"
When exiting a detached thread, the current stack is added to a linked list of "orphaned stacks" (this can happen in C code)
At the end of __wasi_thread_start_C atomically set the current stack a ready for collection.
Whenever a new thread is started (or whenever __tl_lock is called?) the calling thread walks the list of "orphaned stacks" and calls free on any stack marked as "ready for collection".

alexcrichton · 2023-04-15T19:04:04Z

my understanding is that these weak symbols are resolved to the real __tl_lock/unlock
when pthread_create.o is linked.

Aha you are correct, I was misunderstanding what a weak alias did. Disregard these comments of mine then!

I wonder if we should instead come up with a different solution that doesn't involve a thread free'ing its own stack. How about this:

Is the "ready for collection" flag necessary in that scheme? Given that the linked list is protected by __tl_lock() it shouldn't be possible to witness an entry of the list which isn't collectable I think? (assuming detached threads are updated to skip the explicit __tl_unlock and fall through to the asm that is added in this PR)

It seems a bit of a bummer though in that if you have a bunch of threads all exit at once and you never create another thread then that's a bunch of memory stuck which can't be reused. Another possible alternative though would be to open-code the mutex around free a bit where exiting a detached thread would look like:

In C, acquire the free lock
In C, return to asm
In asm, set the stack pointer to "close to the end of the stack"
In asm, call the internal free function which doesn't acquire a lock
In asm, release the free lock
In asm, release the __tl_lock() lock

The weird part though is that free is running on a stack that it's actively freeing so it relies a bit on the updates to that memory to only modify possibly the header or footer of the block of memory, nothing in the middle, otherwise the stack of free itself could be corrupted. Alternatively though there could be a small static which free is executed on as a stack pointer.

yamt · 2023-04-17T05:20:43Z

I wonder if we should instead come up with a different solution that doesn't involve a thread free'ing its own stack.

for detached case, i'm thinking:

a. implement __unmapself equivalent for wasi as well
b. do something like the following in the relevant part of __pthread_exit. (and probably free it in pthread_create as well.)

free(lazy); // lazy is a global var
lazy = self->map_base1
self->map_base = NULL;

right now i'm inclined in b. because free can require more complex context than munmap.
anyway i plan to do it in a separate PR.

sbc100 · 2023-04-17T16:04:35Z

I wonder if we should instead come up with a different solution that doesn't involve a thread free'ing its own stack.

for detached case, i'm thinking:

a. implement __unmapself equivalent for wasi as well b. do something like the following in the relevant part of __pthread_exit. (and probably free it in pthread_create as well.)
free(lazy); // lazy is a global var
lazy = self->map_base1
self->map_base = NULL;
right now i'm inclined in b. because free can require more complex context than munmap. anyway i plan to do it in a separate PR.

Neat idea. Sounds like that could work.

yamt · 2023-04-28T07:29:53Z

does anyone still have any concerns?

yamt · 2023-05-30T11:38:14Z

ping

sbc100 · 2023-05-30T15:23:25Z

libc-top-half/musl/src/thread/wasm32/wasi_thread_start.s

+	i32.const   __thread_list_lock
+	i32.const   1
+	memory.atomic.notify 0
+	drop


How about putting all this new code a new local function called __tl_unlock_asm ? And perhaps mention in the comment why we need to asm re-implementation here.

Otherwise this lgtm now.

i added a comment.

abrown

I spent some time today staring at this trying to convince myself that this is correct. I'm not too familiar with the details of "CLONE_CHILD_CLEARTID" and the "__tl_lock/unlock/sync protocol" referred so it could be that I'm too concerned about something that is actually fine here. What is concerning though is that it is not easy to figure out what the change is doing and I suspect that will make troubleshooting difficult in the future.

Tell me if this is a correct reading: we need to control concurrent access to __thread_list_lock, which previously was done via __tl_lock, __tl_unlock and __tl_sync. Because the current thread's stack might be deallocated, we can't call __tl_unlock directly and instead do something "equivalent" in assembly code, by just atomically setting __thread_list_lock to zero and waking the next waiter. This means that there are two places munging with __thread_list_lock: the assembly code and __tl_unlock. (Why isn't the function removed then, e.g., to avoid future errors? Presumably it is used somewhere else when the stack is valid?). But the "equivalent" assembly code is also not completely equivalent: it doesn't decrement tl_lock_count as __tl_unlock does. So it appears as if there are cases where the assembly code could result in different behavior than the original __tl_unlock--i.e., when __tl_lock has been called multiple times previously. It is unclear why this different behavior is OK here.

None of this is explained in the commit comments or commit message. It feels like some kind of bug is already present (re: tl_lock_count)--if it is not, it still "appears" that way because there is no explanation as to why we can disregard tl_lock_count. And, since there is little explanation, it seems likely that if there isn't a bug already present, there could be one soon if someone again inserts a __tl_unlock in the wrong place, e.g., (since they should rely on the assembly code).

libc-top-half/musl/src/thread/pthread_create.c

yamt · 2023-06-01T02:14:32Z

I spent some time today staring at this trying to convince myself that this is correct. I'm not too familiar with the details of "CLONE_CHILD_CLEARTID" and the "__tl_lock/unlock/sync protocol" referred so it could be that I'm too concerned about something that is actually fine here. What is concerning though is that it is not easy to figure out what the change is doing and I suspect that will make troubleshooting difficult in the future.

Tell me if this is a correct reading: we need to control concurrent access to __thread_list_lock, which previously was done via __tl_lock, __tl_unlock and __tl_sync. Because the current thread's stack might be deallocated, we can't call __tl_unlock directly and instead do something "equivalent" in assembly code, by just atomically setting __thread_list_lock to zero and waking the next waiter. This means that there are two places munging with __thread_list_lock: the assembly code and __tl_unlock. (Why isn't the function removed then, e.g., to avoid future errors? Presumably it is used somewhere else when the stack is valid?). But the "equivalent" assembly code is also not completely equivalent: it doesn't decrement tl_lock_count as __tl_unlock does. So it appears as if there are cases where the assembly code could result in different behavior than the original __tl_unlock--i.e., when __tl_lock has been called multiple times previously. It is unclear why this different behavior is OK here.

the asm code mimics CLONE_CHILD_CLEARTID, not __tl_unlock.

CLONE_CHILD_CLEARTID is a linux kernel functionality the original musl code relies on.
when the code was adapted to wasi, i guess the author thought calling __tl_unlock was a
good enough equivalent. however, it actually wasn't. (thus it introduced a race.)
this PR restores the original musl behavior by providing a closer equivalent of
CLONE_CHILD_CLEARTID.

None of this is explained in the commit comments or commit message. It feels like some kind of bug is already present (re: tl_lock_count)--if it is not, it still "appears" that way because there is no explanation as to why we can disregard tl_lock_count. And, since there is little explanation, it seems likely that if there isn't a bug already present, there could be one soon if someone again inserts a __tl_unlock in the wrong place, e.g., (since they should rely on the assembly code).

i feel it's actually clearly explained in the comment.

sbc100 · 2023-06-01T03:01:59Z

the asm code mimics CLONE_CHILD_CLEARTID, not __tl_unlock.

I think this is the part that it took be a while to understand. At this point I think the comment explains its but early on I was confused about this point (and CLONE_CHILD_CLEARTID in general).

yamt · 2023-06-01T04:20:04Z

I'm not too familiar with the details of "CLONE_CHILD_CLEARTID"

https://www.man7.org/linux/man-pages/man2/clone.2.html seems to have a good enough explanation to read this patch:

CLONE_CHILD_CLEARTID (since Linux 2.5.49)
              Clear (zero) the child thread ID at the location pointed
              to by child_tid (clone()) or cl_args.child_tid (clone3())
              in child memory when the child exits, and do a wakeup on
              the futex at that address.

I want to include WebAssembly/wasi-libc#409 which is rather easy to hit when people are experimenting wasi-threads.

yamt mentioned this pull request Apr 14, 2023

Data race or race condition during exiting/joining a thread #405

Open

yamt force-pushed the join-race branch from 87369c4 to d92075a Compare April 14, 2023 12:58

Fix races around pthread exit and join

99db501

yamt force-pushed the join-race branch from d92075a to 99db501 Compare April 14, 2023 13:40

yamt marked this pull request as ready for review April 14, 2023 13:40

alexcrichton reviewed Apr 14, 2023

View reviewed changes

sbc100 reviewed Apr 15, 2023

View reviewed changes

sbc100 reviewed May 30, 2023

View reviewed changes

sbc100 approved these changes May 30, 2023

View reviewed changes

wasi_thread_start: add a comment

7262ca4

abrown reviewed Jun 1, 2023

View reviewed changes

libc-top-half/musl/src/thread/pthread_create.c Show resolved Hide resolved

sbc100 merged commit aecd368 into WebAssembly:main Jun 7, 2023

yamt added a commit to yamt/wasi-sdk that referenced this pull request Aug 14, 2023

Bump wasi-libc

5dc4ef1

I want to include WebAssembly/wasi-libc#409 which is rather easy to hit when people are experimenting wasi-threads.

yamt mentioned this pull request Aug 14, 2023

Bump wasi-libc WebAssembly/wasi-sdk#340

Merged

yamt added a commit to yamt/wasi-sdk that referenced this pull request Aug 15, 2023

Bump wasi-libc

60b93f4

I want to include WebAssembly/wasi-libc#409 which is rather easy to hit when people are experimenting wasi-threads.

abrown pushed a commit to WebAssembly/wasi-sdk that referenced this pull request Aug 22, 2023

Bump wasi-libc (#340)

4025e95

I want to include WebAssembly/wasi-libc#409 which is rather easy to hit when people are experimenting wasi-threads.

alexcrichton mentioned this pull request May 20, 2024

Cranelift: assertion left == right failed: the memory base pointer may be incorrect due to sharing memory bytecodealliance/wasmtime#8652

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix races around pthread exit and join #409

Fix races around pthread exit and join #409

yamt commented Apr 14, 2023

alexcrichton left a comment

alexcrichton Apr 14, 2023

sbc100 Apr 14, 2023

yamt Apr 14, 2023

yamt Apr 14, 2023

alexcrichton Apr 15, 2023

yamt Apr 17, 2023

yamt commented Apr 14, 2023

sbc100 Apr 15, 2023

sbc100 Apr 15, 2023

yamt Apr 15, 2023

yamt Apr 15, 2023

sbc100 Apr 15, 2023 •

edited

Loading

yamt Apr 15, 2023

sbc100 commented Apr 15, 2023 •

edited

Loading

alexcrichton commented Apr 15, 2023

yamt commented Apr 17, 2023

sbc100 commented Apr 17, 2023

yamt commented Apr 28, 2023

yamt commented May 30, 2023

sbc100 May 30, 2023

yamt May 31, 2023

abrown left a comment •

edited

Loading

yamt commented Jun 1, 2023

sbc100 commented Jun 1, 2023

yamt commented Jun 1, 2023

Fix races around pthread exit and join #409

Fix races around pthread exit and join #409

Conversation

yamt commented Apr 14, 2023

alexcrichton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yamt commented Apr 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbc100 Apr 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbc100 commented Apr 15, 2023 • edited Loading

alexcrichton commented Apr 15, 2023

yamt commented Apr 17, 2023

sbc100 commented Apr 17, 2023

yamt commented Apr 28, 2023

yamt commented May 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrown left a comment • edited Loading

Choose a reason for hiding this comment

yamt commented Jun 1, 2023

sbc100 commented Jun 1, 2023

yamt commented Jun 1, 2023

sbc100 Apr 15, 2023 •

edited

Loading

sbc100 commented Apr 15, 2023 •

edited

Loading

abrown left a comment •

edited

Loading