Skip to content

Commit

Permalink
executor: fix deadlock in service exit from netns workers (#7610)
Browse files Browse the repository at this point in the history
There was a race condition where contexts get canceled after a job is
sent to the netns worker but before the result got read. This caused
runInNetNS to exit (due to canceled context) but the result chan to
never be read from. It was crucially an unbuffered chan, which resulted
in the worker never being able to exit and the whole container cleanup
to block indefinitely.

The fix here is just to make that chan buffered with a size of 1 so that
the worker doesn't ever get blocked trying to write to it.

There's a few other related changes of making some other chans buffered
and explicitly closing them with a defer (to handle panic cases) which
aren't needed to fix this issue but seemed worth tidying up now.

Signed-off-by: Erik Sipsma <[email protected]>
  • Loading branch information
sipsma authored Jun 11, 2024
1 parent 573e65d commit 4a44a52
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions engine/buildkit/linux_namespace.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,15 @@ func runInNetNS[T any](
value T
err error
}
resultCh := make(chan result)
resultCh := make(chan result, 1)

select {
case <-ctx.Done():
return zero, context.Cause(ctx)
case <-state.done:
return zero, fmt.Errorf("container exited")
case state.netNSJobs <- func() {
defer close(resultCh)
v, err := fn()
resultCh <- result{value: v, err: err}
}:
Expand Down Expand Up @@ -108,8 +109,9 @@ func (w *Worker) runNetNSWorkers(ctx context.Context, state *execState) error {
}}}

// must run in it's own isolated goroutine since it will lock to threads
errCh := make(chan error)
errCh := make(chan error, 1)
go func() {
defer close(errCh)
errCh <- nsw.run(ctx, state.netNSJobs)
}()
err := <-errCh
Expand Down

0 comments on commit 4a44a52

Please sign in to comment.