-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🥅 Kill servers on engine death #63
Conversation
Signed-off-by: Joe Runde <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #63 +/- ##
==========================================
- Coverage 61.19% 60.52% -0.67%
==========================================
Files 20 20
Lines 1193 1183 -10
Branches 211 208 -3
==========================================
- Hits 730 716 -14
- Misses 387 391 +4
Partials 76 76 ☔ View full report in Codecov by Sentry. |
Waiting on vllm-project/vllm#6594 to make sure that we have a common way of stopping the server |
vllm-project/vllm#6594 now merged finally |
vllm-project/vllm#6594 is now merged- I'll see if I can take another pass at this to not use os.kill |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joerunde!
Description
See vllm-project/vllm#6594
This PR modifies the grpc server's exception handler to check if the async llm engine is still running.
If it's not then the handler sends a SIGTERM to kill the serving process, which should trigger the graceful shutdown handlers.
This approach is a bit heavy-handed, but
await server.stop(0)
seems very unhappy if called within the context of a request handler. Maybe we could use some grpc context termination callbacks for this, but I didn't really think it was worth the time to investigate that too deeply. I think the only real drawback here is that unit testing this feature would be pretty difficult with anos.kill
in the mix.This PR currently does not add this handling to the http server, but once PRs 6594 and vllm-project/vllm#6740 are merged, we should be able to remove our copy-pasta http server so both will handle engine death properly.
How Has This Been Tested?
Tested by installing vllm==0.5.3.post1 and the adapter from this branch onto a dev pod, injecting a runtime failure into the llm engine, and running a grpc request.
Merge criteria: