Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix races around pthread exit and join #409

Merged
merged 2 commits into from
Jun 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ ifeq ($(THREAD_MODEL), posix)
# Specify the tls-model until LLVM 15 is released (which should contain
# https://reviews.llvm.org/D130053).
CFLAGS += -mthread-model posix -pthread -ftls-model=local-exec
ASMFLAGS += -matomics

# Include cloudlib's directory to access the structure definition of clockid_t
CFLAGS += -I$(LIBC_BOTTOM_HALF_CLOUDLIBC_SRC)
Expand Down
22 changes: 8 additions & 14 deletions libc-top-half/musl/src/thread/pthread_create.c
Original file line number Diff line number Diff line change
Expand Up @@ -164,14 +164,6 @@ static void __pthread_exit(void *result)
self->prev->next = self->next;
self->prev = self->next = self;

#ifndef __wasilibc_unmodified_upstream
/* On Linux, the thread is created with CLONE_CHILD_CLEARTID,
* and this lock will unlock by kernel when this thread terminates.
* So we should unlock it here in WebAssembly.
* See also set_tid_address(2) */
__tl_unlock();
#endif
abrown marked this conversation as resolved.
Show resolved Hide resolved

#ifdef __wasilibc_unmodified_upstream
if (state==DT_DETACHED && self->map_base) {
/* Detached threads must block even implementation-internal
Expand All @@ -190,9 +182,6 @@ static void __pthread_exit(void *result)
}
#else
if (state==DT_DETACHED && self->map_base) {
// __syscall(SYS_exit) would unlock the thread, list
// do it manually here
__tl_unlock();
free(self->map_base);
// Can't use `exit()` here, because it is too high level
return;
Expand All @@ -212,10 +201,15 @@ static void __pthread_exit(void *result)
#ifdef __wasilibc_unmodified_upstream
for (;;) __syscall(SYS_exit, 0);
#else
// __syscall(SYS_exit) would unlock the thread, list
// do it manually here
__tl_unlock();
// Can't use `exit()` here, because it is too high level

/* On Linux, the thread is created with CLONE_CHILD_CLEARTID,
* and the lock (__thread_list_lock) will be unlocked by kernel when
* this thread terminates.
* See also set_tid_address(2)
*
* In WebAssembly, we leave it to wasi_thread_start instead.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead call __tl_unlock right before a_store(&self->detach_state, DT_EXITED); like we do right be free(self->map_base);?

In the first case we are unlocking right before notifying the joiner that they can call free and in the later case we are unlocking right before we call free ourselves, so it seem symmetrical.

(I'm suggesting this so that we can avoid re-implementing __tl_unlock in asm if we can).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what was wrong with the single __tl_unlock on line 172 that we had previously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead call __tl_unlock right before a_store(&self->detach_state, DT_EXITED); like we do right be free(self->map_base);?

In the first case we are unlocking right before notifying the joiner that they can call free and in the later case we are unlocking right before we call free ourselves, so it seem symmetrical.

  • the detached case is broken as alex said.
  • detached threads do never have joiners.

(I'm suggesting this so that we can avoid re-implementing __tl_unlock in asm if we can).

this PR doesn't re-implement __tl_unlock.
it emulates CLONE_CHILD_CLEARTID, which the __tl_lock/unlock/sync protocol relies on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what was wrong with the single __tl_unlock on line 172 that we had previously?

  • double unlock.
  • it unlocks too early and allows joiner to free our stack before we finish on it.

Copy link
Member

@sbc100 sbc100 Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the double unlock thing looks like a bug.

But the joiner is not waiting on the tl_lock is it? The joiner seems to be waiting on t->detach_state. In fact, I don't see the tl_lock referenced at all in pthread_join.c. Maybe I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the joiner uses __td_sync to sync with the exit.

#endif
}

Expand Down
17 changes: 17 additions & 0 deletions libc-top-half/musl/src/thread/wasm32/wasi_thread_start.s
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,21 @@ wasi_thread_start:
local.get 1 # start_arg
call __wasi_thread_start_C

# Unlock thread list. (as CLONE_CHILD_CLEARTID would do for Linux)
#
# Note: once we unlock the thread list, our "map_base" can be freed
# by a joining thread. It's safe as we are in ASM and no longer use
# our C stack or pthread_t. It's impossible to do this safely in C
# because there is no way to tell the C compiler not to use C stack.
i32.const __thread_list_lock
i32.const 0
i32.atomic.store 0
# As an optimization, we can check tl_lock_waiters here.
# But for now, simply wake up unconditionally as
# CLONE_CHILD_CLEARTID does.
i32.const __thread_list_lock
i32.const 1
memory.atomic.notify 0
drop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting all this new code a new local function called __tl_unlock_asm ? And perhaps mention in the comment why we need to asm re-implementation here.

Otherwise this lgtm now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a comment.


end_function