Add Runpod Provider #157

pandyamarut · 2024-09-30T11:08:44Z

Why this PR
We want to add Runpod as remote inference provider for Llama-stack. Runpod endpoints are OpenAI Compatible, hence it's recommended to use it with Runpod model serving endpoints.

What does PR Includes

Integration with the Distribution.
OpenAI as a Client.

How did we test?
After setting the configuration by providing the : endpoint_url and api_key and keeping other settings as a default, launched a server using:

llama stack run remote_runpod --port 8080.

Invoke the call(streaming):
curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":true}'

Response:

data: {"event":{"event_type":"start","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"Here","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"'s","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"complete","delta":"","logprobs":null,"stop_reason":"end_of_turn"}}

Invoke the call(non-streaming)
curl -X POST http://localhost:8080/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"hello world, write me a 2 sentence poem about the moon", "role": "user"}],"stream":false}'

Response:

data: {"completion_message":{"role":"assistant","content":"Here's a 2-sentence poem about the moon:\n\nThe moon glows softly in the midnight sky, \nA beacon of peace, as it drifts gently by.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}

Signed-off-by: Marut Pandya <[email protected]>

Add Runpod Provider

pandyamarut · 2024-10-02T02:11:35Z

@ashwinb @yanxi0830 @hardikjshah when can I expect review? Thanks.

ashwinb · 2024-10-03T18:39:46Z

Thanks for the PR @pandyamarut! We are putting together a few tests in the repository now so we can make sure inference works reliably (especially w.r.t. tool calling, etc.) wherever we are dealing with openai-compatible endpoints. Usually we vastly prefer a raw token API (e.g., HuggingFace's text-generation one https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/tgi/tgi.py#L136). Expect some changes around here in a couple days. I will post an update when this happens. There are a couple other inference-related PRs also which are kind of languishing without review because of this issue.

pandyamarut · 2024-10-07T19:15:45Z

@ashwinb Sure. Thanks for the update. Looking forward to getting this merged soon.

Marut Pandya and others added 2 commits September 30, 2024 03:52

Add Runpod Provider

02d3ffd

Signed-off-by: Marut Pandya <[email protected]>

Merge pull request #1 from pandyamarut/add-provider-runpod

aeaa982

Add Runpod Provider

pandyamarut requested review from ashwinb, yanxi0830, hardikjshah, dltn and raghotham as code owners September 30, 2024 11:08

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2024

russellb mentioned this pull request Oct 5, 2024

Create shared openai-compatible inference adapter #193

Open

terrytangyuan mentioned this pull request Oct 6, 2024

Add generic OpenAI compatible inference provider #195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Runpod Provider #157

Add Runpod Provider #157

pandyamarut commented Sep 30, 2024

pandyamarut commented Oct 2, 2024

ashwinb commented Oct 3, 2024

pandyamarut commented Oct 7, 2024

Add Runpod Provider #157

Are you sure you want to change the base?

Add Runpod Provider #157

Conversation

pandyamarut commented Sep 30, 2024

pandyamarut commented Oct 2, 2024

ashwinb commented Oct 3, 2024

pandyamarut commented Oct 7, 2024