[#432] Add Groq Provider - chat completions #609

aidando73 · 2024-12-11T23:39:08Z

What does this PR do?

Contributes towards issue (#432)

Groq text chat completions
Streaming
All the sampling params that Groq supports

A lot of inspiration taken from @mattf's good work at #355

What this PR does not do

Tool calls (Future PR)
Adding llama-guard model
See if we can add embeddings

PR Train

Test Plan

Environment

export GROQ_API_KEY=<api_key>

wget https://raw.githubusercontent.com/aidando73/llama-stack/240e6e2a9c20450ffdcfbabd800a6c0291f19288/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/92c9b5297f9eda6a6e901e1adbd894e169dbb278/run.yaml

# Build and run environment
pip install -e . \
&& llama stack build --config ./build.yaml --image-type conda \
&& llama stack run ./run.yaml \
  --port 5001

Manual tests

Using this jupyter notebook to test manually: https://github.com/aidando73/llama-stack/blob/2140976d76ee7ef46025c862b26ee87585381d2a/hello.ipynb

Use this code to test passing in the api key from provider_data

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url="http://localhost:5001",
)

response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Hello, world client!"},
    ],
    # Test passing in groq_api_key from the client
    # Need to comment out the groq_api_key in the run.yaml file
    x_llama_stack_provider_data='{"groq_api_key": "<api-key>"}',
    # stream=True,
)
response

Integration

pytest llama_stack/providers/tests/inference/test_text_inference.py -v -k groq

(run in same environment)

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_3b-groq] PASSED                 [  6%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_3b-groq] SKIPPED (Other inf...) [ 12%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_3b-groq] SKIPPED [ 18%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_3b-groq] PASSED [ 25%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_3b-groq] SKIPPED (Ot...) [ 31%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_3b-groq] PASSED  [ 37%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_3b-groq] SKIPPED [ 43%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_3b-groq] SKIPPED [ 50%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_8b-groq] PASSED                 [ 56%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_8b-groq] SKIPPED (Other inf...) [ 62%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_8b-groq] SKIPPED [ 68%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_8b-groq] PASSED [ 75%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_8b-groq] SKIPPED (Ot...) [ 81%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_8b-groq] PASSED  [ 87%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_8b-groq] SKIPPED [ 93%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_8b-groq] SKIPPED [100%]

======================================= 6 passed, 10 skipped, 160 deselected, 7 warnings in 2.05s ========================================

Unit tests

pytest llama_stack/providers/tests/inference/groq/ -v

llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_sets_model PASSED            [  5%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_user_message PASSED [ 10%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_system_message PASSED [ 15%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_completion_message PASSED [ 20%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_logprobs PASSED [ 25%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_response_format PASSED [ 30%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_repetition_penalty PASSED [ 35%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_stream PASSED       [ 40%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_n_is_1 PASSED                [ 45%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_if_max_tokens_is_0_then_it_is_not_included PASSED [ 50%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_max_tokens_if_set PASSED [ 55%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_temperature PASSED  [ 60%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_top_p PASSED        [ 65%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_returns_response PASSED [ 70%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_stop_to_end_of_message PASSED [ 75%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_length_to_end_of_message PASSED [ 80%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertStreamChatCompletionResponse::test_returns_stream PASSED [ 85%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_raises_runtime_error_if_config_is_not_groq_config PASSED [ 90%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_returns_groq_adapter PASSED                            [ 95%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqConfig::test_api_key_defaults_to_env_var PASSED                   [100%]

==================================================== 20 passed, 11 warnings in 0.08s =====================================================

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Ran pre-commit to handle lint / formatting issues.
Read the contributor guideline,
Pull Request section?
Updated relevant documentation
Wrote necessary unit or integration tests.

aidando73 · 2024-12-12T05:54:52Z

llama_stack/providers/remote/inference/groq/groq_utils.py

+    elif finish_reason == "length":
+        return StopReason.end_of_message
+    elif finish_reason == "tool_calls":
+        raise NotImplementedError("tool_calls is not supported yet")


Users won't be able to hit this error yet since they can't pass tools as a parameter

aidando73 · 2024-12-12T06:07:01Z

llama_stack/providers/tests/inference/test_text_inference.py

+            "remote::groq",
+        ):
+            pytest.skip(provider.__provider_spec__.provider_type + " doesn't support tool calling yet")
+


As per comment above: https://github.com/meta-llama/llama-stack/pull/609/files#r1881443869

Will remove after I implement

aidando73 · 2024-12-12T06:10:37Z

llama_stack/providers/remote/inference/groq/groq_utils.py

+        warnings.warn("repetition_penalty is not supported")
+
+    if request.tools:
+        warnings.warn("tools are not supported yet")


I’m planning to handle tool calls in a separate PR since there are edge cases I want to cover properly.

But lmk if you want me to include it within this PR

mattf

looking good!

mattf · 2024-12-13T15:09:09Z

llama_stack/providers/remote/inference/groq/config.py

+@json_schema_type
+class GroqConfig(BaseModel):
+    api_key: Optional[str] = Field(
+        default=None,


nit - the groq library will read GROQ_API_KEY env (https://github.com/groq/groq-python/blob/main/src/groq/_client.py#L86), consider adding a comment here so people in the LS codebase know this expectation

Let's not rely on environment variables for code that we expect to run in llama-stack server. We would want to take in the api key as a config variable in run.yaml when we spin up the server

We would want to take in the api key as a config variable in run.yaml when we spin up the server

@raghotham, I believe that's the behaviour at the moment. This is how fireworks and together define their configs:

llama-stack/llama_stack/providers/remote/inference/fireworks/config.py

Lines 19 to 22 in 96e158e

api_key: Optional[str] = Field(

default=None,

description="The Fireworks.ai API Key",

)

llama-stack/llama_stack/providers/remote/inference/together/config.py

Lines 19 to 22 in 6765fd7

api_key: Optional[str] = Field(

default=None,

description="The Together AI API Key",

)

And it's in the run.yaml that you define the environment variable:

llama-stack/llama_stack/templates/together/run.yaml

Lines 18 to 20 in 516e1a3

config:

url: https://api.together.xyz/v1

api_key: ${env.TOGETHER_API_KEY}

wdyt?

yes @aidando73 this is correct. Note that both Together and Fireworks also support grabbing the api key from headers via the NeedsProviderData mixin. You can add that if you feel like it.

Done - added the mixin

Added some client code in the test plan as well to test

mattf · 2024-12-13T15:10:38Z

llama_stack/providers/remote/inference/groq/groq.py

+    ]:
+
+        if model_id == "llama-3.2-3b-preview":
+            warnings.warn(


very user friendly +1

mattf · 2024-12-13T15:17:12Z

llama_stack/providers/remote/inference/groq/groq_utils.py

+        logprobs=None,
+        frequency_penalty=None,
+        stream=request.stream,
+        # Groq only supports n=1 at the time of writing


the LS structures only support responses w/ 1 choice, so the fact that groq only supports n=1 is moot. you should even skip passing the param around.

mattf · 2024-12-13T15:24:50Z

llama_stack/providers/remote/inference/groq/groq_utils.py

+        raise ValueError(f"Invalid finish reason: {finish_reason}")
+
+
+async def convert_chat_completion_response_stream(


in some other PR, we should merge this into the general util module.

Do you think this function is too coupled to Groq types to be used as general util function? E.g., this one takes in a ChatCompletionChunk from groq.types.chat.chat_completion_chunk

aidando73 · 2024-12-14T02:33:30Z

llama_stack/providers/remote/inference/groq/groq_utils.py

+
+    if request.logprobs:
+        # Groq doesn't support logprobs at the time of writing
+        warnings.warn("logprobs are not supported yet")


https://console.groq.com/docs/api-reference#chat

aidando73 · 2024-12-14T02:33:53Z

llama_stack/providers/remote/inference/groq/groq_utils.py

+
+    if request.response_format:
+        # Groq's JSON mode is beta at the time of writing
+        warnings.warn("response_format is not supported yet")


JSON mode still in beta for Groq:

https://console.groq.com/docs/text-chat#json-mode-object-object

llama_stack/providers/remote/inference/groq/groq.py

llama_stack/providers/remote/inference/groq/groq_utils.py

llama_stack/providers/tests/inference/groq/test_groq_utils.py

llama_stack/providers/remote/inference/groq/groq_utils.py

llama_stack/providers/tests/inference/groq/test_groq_utils.py

ashwinb · 2024-12-20T00:12:58Z

llama_stack/providers/remote/inference/groq/__init__.py

+
+
+async def get_adapter_impl(config: GroqConfig, _deps) -> Inference:
+    # import dynamically so `llama stack build` does not fail due to missing dependencies


the correct comment should be import dynamically so the import is used only when it is needed :D

ashwinb · 2024-12-20T00:15:34Z

llama_stack/providers/remote/inference/groq/groq.py

+        CoreModelId.llama3_70b_instruct.value,
+    ),
+    build_model_alias(
+        "llama-3.3-70b-versatile",


do you know what does this suffix indicate?

I couldn't find anything online. @ricklamers @philass - could you provide any additional context here?

ashwinb · 2024-12-20T00:18:17Z

llama_stack/providers/remote/inference/groq/groq_utils.py

+
+        yield ChatCompletionResponseStreamChunk(
+            event=ChatCompletionResponseEvent(
+                event_type=next(event_types),


it seems unnecessary to refactor the event type generator separately. just maintain an index here or even more simply just repeat a small amount of code to send back a start chunk.

ashwinb

awesome, thank you very much!

ashwinb · 2024-12-20T00:18:58Z

(will merge after a couple small comments are addressed)

aidando73 · 2024-12-21T09:43:06Z

Ok @ashwinb I've addressed your comments

aidando73 requested review from ashwinb, yanxi0830, hardikjshah, dltn and raghotham as code owners December 11, 2024 23:39

aidando73 marked this pull request as draft December 11, 2024 23:39

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 11, 2024

aidando73 commented Dec 12, 2024

View reviewed changes

aidando73 force-pushed the aidand-432-groq_2 branch from 8fa0bae to 98e3563 Compare December 12, 2024 10:11

aidando73 changed the title ~~Add Groq Inference Provider~~ Add Groq Provider - chat completions Dec 12, 2024

aidando73 force-pushed the aidand-432-groq_2 branch 2 times, most recently from ec8b47b to 3f2498e Compare December 12, 2024 10:28

aidando73 marked this pull request as ready for review December 12, 2024 10:53

mattf reviewed Dec 13, 2024

View reviewed changes

aidando73 mentioned this pull request Dec 13, 2024

Add Remote Inference Adapter for Groq #432

Open

aidando73 force-pushed the aidand-432-groq_2 branch 3 times, most recently from 2f1522a to 3587f08 Compare December 14, 2024 00:42

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/remote/inference/groq/groq.py Outdated Show resolved Hide resolved

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/remote/inference/groq/groq.py Outdated Show resolved Hide resolved

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/remote/inference/groq/groq.py Outdated Show resolved Hide resolved

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/remote/inference/groq/groq_utils.py Outdated Show resolved Hide resolved

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/remote/inference/groq/groq_utils.py Outdated Show resolved Hide resolved

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/tests/inference/groq/test_groq_utils.py Outdated Show resolved Hide resolved

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/remote/inference/groq/groq_utils.py Outdated Show resolved Hide resolved

aidando73 commented Dec 14, 2024

View reviewed changes

llama_stack/providers/tests/inference/groq/test_groq_utils.py Outdated Show resolved Hide resolved

aidando73 changed the title ~~Add Groq Provider - chat completions~~ [#432] Add Groq Provider - chat completions Dec 15, 2024

aidando73 mentioned this pull request Dec 15, 2024

[#432] Add Groq Provider - tool calls #630

Open

5 tasks

ashwinb reviewed Dec 20, 2024

View reviewed changes

ashwinb approved these changes Dec 20, 2024

View reviewed changes

aidando73 force-pushed the aidand-432-groq_2 branch from 78912e6 to 0f6beb1 Compare December 21, 2024 08:49

aidando73 requested review from dineshyv and vladimirivic as code owners December 21, 2024 08:49

Add Groq provider - chat completions

c0757fd

aidando73 force-pushed the aidand-432-groq_2 branch from 0f6beb1 to c0757fd Compare December 21, 2024 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#432] Add Groq Provider - chat completions #609

[#432] Add Groq Provider - chat completions #609

aidando73 commented Dec 11, 2024 •

edited

Loading

aidando73 Dec 12, 2024

aidando73 Dec 12, 2024 •

edited

Loading

aidando73 Dec 12, 2024

mattf left a comment

mattf Dec 13, 2024

aidando73 Dec 14, 2024

raghotham Dec 14, 2024

aidando73 Dec 14, 2024 •

edited

Loading

ashwinb Dec 20, 2024

aidando73 Dec 21, 2024 •

edited

Loading

mattf Dec 13, 2024

mattf Dec 13, 2024

aidando73 Dec 14, 2024

mattf Dec 13, 2024

aidando73 Dec 14, 2024 •

edited

Loading

aidando73 Dec 14, 2024 •

edited

Loading

aidando73 Dec 14, 2024

ashwinb Dec 20, 2024

aidando73 Dec 21, 2024

ashwinb Dec 20, 2024

aidando73 Dec 21, 2024

ashwinb Dec 20, 2024

aidando73 Dec 21, 2024

ashwinb left a comment

ashwinb commented Dec 20, 2024

aidando73 commented Dec 21, 2024

	api_key: Optional[str] = Field(
	default=None,
	description="The Fireworks.ai API Key",
	)

	api_key: Optional[str] = Field(
	default=None,
	description="The Together AI API Key",
	)

	config:
	url: https://api.together.xyz/v1
	api_key: ${env.TOGETHER_API_KEY}

		raise ValueError(f"Invalid finish reason: {finish_reason}")


		async def convert_chat_completion_response_stream(



		async def get_adapter_impl(config: GroqConfig, _deps) -> Inference:
		# import dynamically so `llama stack build` does not fail due to missing dependencies

[#432] Add Groq Provider - chat completions #609

Are you sure you want to change the base?

[#432] Add Groq Provider - chat completions #609

Conversation

aidando73 commented Dec 11, 2024 • edited Loading

What does this PR do?

PR Train

Test Plan

Before submitting

Choose a reason for hiding this comment

aidando73 Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aidando73 Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aidando73 Dec 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aidando73 Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

aidando73 Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashwinb left a comment

Choose a reason for hiding this comment

ashwinb commented Dec 20, 2024

aidando73 commented Dec 21, 2024

aidando73 commented Dec 11, 2024 •

edited

Loading

aidando73 Dec 12, 2024 •

edited

Loading

aidando73 Dec 14, 2024 •

edited

Loading

aidando73 Dec 21, 2024 •

edited

Loading

aidando73 Dec 14, 2024 •

edited

Loading

aidando73 Dec 14, 2024 •

edited

Loading