Skip to content

How to unload and change models for local offline inferencing with Aphrodite? #510

Discussion options

You must be logged in to vote

Thanks for the suggestion! I actually got it working. Here's the gist of how I'm starting and killing models now, including support for multiple endpoints to distribute the task across gpus:

import subprocess
import os
import psutil
import time
import openai
import concurrent.futures
from tqdm import tqdm

port = 5000
cmd_path = os.path.expanduser("~/work/ml/aphrodite-engine/runtime.sh")
num_actual_gpus = 4

def start_model(model_path, model_dtype, num_gpus, gpu_offset=0, port=5000):
    if num_gpus == 4:
        cmd = f"{cmd_path} python -m aphrodite.endpoints.openai.api_server --model '{model_path}' --dtype 'half' -q {model_dtype} --tensor-parallel-size {num_gpus} --port {port} --host 0…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@murtaza-nasir
Comment options

Answer selected by AlpinDale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants