Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race on watcher update check #260

Merged
merged 1 commit into from
Jan 17, 2024
Merged

Fix race on watcher update check #260

merged 1 commit into from
Jan 17, 2024

Conversation

azdagron
Copy link
Member

When watchers are initialized, they wait until the Workload API has streamed back the initial response before returning from NewXXXWatcher. The semantics are intended such that a call to WaitForUpdate afterwards will only complete when the next response has arrived.

When an update is received, the flow is to:

  1. record the update
  2. close the "got first response" channel (only happens once)
  3. send on the "updated" channel (to signal callers of WaitForUpdate)

However, this sequence is racy, since closing the "got first response" channel first unblocks the newWatcher call then drains the "updated" channel. If the drain happens after step (3) then everything is ok, but if it happens before, then step (3) will send on the channel, which is buffered. This causes the call to WaitForUpdate to unblock even though no update was received.

This change fixes the race by swapping steps (2) and (3). The "updated" channel is sent on and THEN the "got first response channel" is closed so that the drain can take place afterwards.

When watchers are initialized, they wait until the Workload API has
streamed back the initial response before returning from NewXXXWatcher.
The semantics are intended such that a call to WaitForUpdate afterwards
will only complete when the next response has arrived.

When an update is received, the flow is to:
1. record the update
2. close the "got first response" channel (only happens once)
3. send on the "updated" channel (to signal callers of WaitForUpdate)

However, this sequence is racy, since closing the "got first response"
channel first unblocks the newWatcher call then drains the "updated"
channel. If the drain happens after step (3) then everything is ok, but
if it happens before, then step (3) will send on the channel, which is
buffered. This causes the call to WaitForUpdate to unblock even though
no update was received.

This change fixes the race by swapping steps (2) and (3). The "updated"
channel is sent on and THEN the "got first response channel" is closed
so that the drain can take place afterwards.

Signed-off-by: Andrew Harding <[email protected]>
@azdagron azdagron force-pushed the azdagron/fix-wait-race branch from c5d02a2 to 61ead82 Compare January 17, 2024 19:11
Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @azdagron, LGTM!

@azdagron azdagron merged commit 35e62f6 into main Jan 17, 2024
6 checks passed
@azdagron azdagron deleted the azdagron/fix-wait-race branch January 17, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants