-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add MT_BENCH to pipeline #34
Conversation
cd52165
to
2ccfeb8
Compare
2ccfeb8
to
ed868cc
Compare
aabd1bf
to
08cfd37
Compare
50c3fe4
to
9c21301
Compare
# This seems like excessive effort to stop the vllm process, but merely saving & killing the pid doesn't work | ||
# Also, the base image does not include `pkill` cmd, so can't pkill -f vllm.entrypoints.openai.api_server either | ||
def stop_vllm_server_by_name(): | ||
import psutil | ||
|
||
for process in psutil.process_iter(attrs=["pid", "name", "cmdline"]): | ||
cmdline = process.info.get("cmdline") | ||
if cmdline and "vllm.entrypoints.openai.api_server" in cmdline: | ||
print(f"Found vLLM server process with PID: {process.info['pid']}, terminating...") | ||
try: | ||
process.terminate() # Try graceful termination | ||
process.wait(timeout=5) # Wait a bit for it to terminate | ||
if process.is_running(): | ||
print(f"Forcefully killing vLLM server process with PID: {process.info['pid']}") | ||
process.kill() # Force kill if it's still running | ||
print(f"Successfully stopped vLLM server with PID: {process.info['pid']}") | ||
except psutil.NoSuchProcess: | ||
print(f"Process with PID {process.info['pid']} no longer exists.") | ||
except psutil.AccessDenied: | ||
print(f"Access denied when trying to terminate process with PID {process.info['pid']}.") | ||
except Exception as e: | ||
print(f"Failed to terminate process with PID {process.info['pid']}. Error: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not working in the first place because vLLM does not handle signals properly. It expects to be run from the CLI and thus stopped via SIGINT.
If you use a process group in the instantiation above you can get ride of the current pid search logic and simply do:
process_group_id = os.getpgid(process.pid)
process.send_signal(signal.SIGINT)
try:
print("Waiting for vLLM server to shut down gracefully")
process.wait(timeout)
except subprocess.TimeoutExpired:
print(
f"Sending SIGKILL to vLLM server since timeout ({timeout}s) expired"
)
process.kill()
# Attempt to cleanup any remaining child processes
# Make sure process_group is legit (> 1) before trying
if process_group_id and process_group_id > 1:
try:
os.killpg(process_group_id, signal.SIGKILL)
print("Sent SIGKILL to vLLM process group")
except ProcessLookupError:
print("Nothing left to clean up with the vLLM process group")
else:
print("vLLM process group id not found")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please add in a follow-up, I need to be done with this PR 🤣
@leseb how about if we merge this, then you add the PID/Kill changes in a follow-up? |
Sounds good :). |
eval/mt_bench/components.py
Outdated
import requests | ||
|
||
command = f"nohup python3.11 -m vllm.entrypoints.openai.api_server --model {model_path} &" | ||
subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) | |
subprocess.Popen(args=command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) |
Why subprocess.PIPE
? Both stdout
and sterr
are not used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relic from another try, another time - I'll remove those
eval/mt_bench/components.py
Outdated
|
||
EVAL_IMAGE = "quay.io/sallyom/instructlab-ocp:eval" | ||
|
||
@component(base_image=EVAL_IMAGE, packages_to_install=["vllm"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we enforce a particular version? @git+https://github.com/opendatahub-io/[email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe bake it into the image? It's our custom image anyways... Installing in runtime without a lockfile is not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was always my plan, but wasn't convenient while iterating on the 1000 different ways I tried to get vllm working properly - I'll add that to the Containerfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tumido I'll update the Containerfile in a follow-up & remove the runtime install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes- will do that in a follow-up - will move the pip install to the Containerfile
Signed-off-by: sallyom <[email protected]>
Signed-off-by: sallyom <[email protected]>
5c53958
to
be01551
Compare
10d77bb
to
948971d
Compare
Signed-off-by: sallyom <[email protected]>
948971d
to
92f6057
Compare
Signed-off-by: sallyom <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This adds MT_Bench - and uses ilab image with
ilab model evaluate --benchmark mt_bench