Skip to content

Commit 4a44a52

Browse files
authored
executor: fix deadlock in service exit from netns workers (#7610)
There was a race condition where contexts get canceled after a job is sent to the netns worker but before the result got read. This caused runInNetNS to exit (due to canceled context) but the result chan to never be read from. It was crucially an unbuffered chan, which resulted in the worker never being able to exit and the whole container cleanup to block indefinitely. The fix here is just to make that chan buffered with a size of 1 so that the worker doesn't ever get blocked trying to write to it. There's a few other related changes of making some other chans buffered and explicitly closing them with a defer (to handle panic cases) which aren't needed to fix this issue but seemed worth tidying up now. Signed-off-by: Erik Sipsma <[email protected]>
1 parent 573e65d commit 4a44a52

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

engine/buildkit/linux_namespace.go

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,14 +29,15 @@ func runInNetNS[T any](
2929
value T
3030
err error
3131
}
32-
resultCh := make(chan result)
32+
resultCh := make(chan result, 1)
3333

3434
select {
3535
case <-ctx.Done():
3636
return zero, context.Cause(ctx)
3737
case <-state.done:
3838
return zero, fmt.Errorf("container exited")
3939
case state.netNSJobs <- func() {
40+
defer close(resultCh)
4041
v, err := fn()
4142
resultCh <- result{value: v, err: err}
4243
}:
@@ -108,8 +109,9 @@ func (w *Worker) runNetNSWorkers(ctx context.Context, state *execState) error {
108109
}}}
109110

110111
// must run in it's own isolated goroutine since it will lock to threads
111-
errCh := make(chan error)
112+
errCh := make(chan error, 1)
112113
go func() {
114+
defer close(errCh)
113115
errCh <- nsw.run(ctx, state.netNSJobs)
114116
}()
115117
err := <-errCh

0 commit comments

Comments
 (0)