Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Improve shutdown logic, add more context done checks #685

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nickpetrovic
Copy link
Contributor

  • Improve shutdown logic by running server (gRPC/HTTP) shutdowns and global context cancellation in parallel
  • Merge global and local contexts to avoid modifying downstream code that doesn't need to know about the global context
  • In the runc_server.go RunCStreamLogs func, make sure we can cancel the loop if the gateway is being shutdown
  • Cancel RequestBuffer heartBeat func if gateway is shutting down

@nickpetrovic
Copy link
Contributor Author

From my testing, these changes allows the gateway to shutdown. Its worth testing this in staging with a remote gpu node to observe how the gateways (multiple) will respond when they are all rollout restarted.

pkg/abstractions/common/logs.go Outdated Show resolved Hide resolved
"sync"
)

func MergeContexts(ctxs ...context.Context) (context.Context, context.CancelFunc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we always want child contexts to be cancelled immediately if the parents is cancelled? Just want to make sure we don't cancel a DB transaction or something.

Copy link
Contributor Author

@nickpetrovic nickpetrovic Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use this for every time theres two contexts. This is mostly a helper for when merging two contexts is okay. For example, we're merging contexts before we call the LogStream.Stream function. In particular, we can break out of the for loop in handleStream() with just a single/wrapped context vs adding a global context to LogStream and then adding a second case statement in handleStream.

We're also using the merged context in goroutines where we don't need to add a second case statement for a global context - at least I don't think we do. 1. because they're goroutines and errors aren't handled/returned there and 2. I didn't see a reason to have different logic for when the global context was cancelled.

Hopefully that makes sense. LMK if there's something that you're thinking of specifically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants