Skip to content

Commit

Permalink
Feature/new models added pixtral and qwen 2.5 models (#63)
Browse files Browse the repository at this point in the history
* added support for qwen2.5 and Pixtral models
* fix runner and store models in better location
* fixed model settings and added documentation on registering the run as part of the gateway endpoint
* make the test entrypoint print the base url directly in the job output.
* updated model documentation
---------
Co-authored-by: Sri Tikkireddy <[email protected]>
  • Loading branch information
stikkireddy authored Sep 24, 2024
1 parent f1fe289 commit 56ea627
Show file tree
Hide file tree
Showing 12 changed files with 303 additions and 51 deletions.
54 changes: 31 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,29 +75,37 @@ Out of the box Ez Deploy Models:

**Note this framework supports much larger set of models these are the ones that have been curated and validated.**

| model_type | cfg_path | huggingface_link | context_length | min_azure_ep_type_gpu | min_aws_ep_type_gpu |
|:-------------|:---------------------------------------------------------|:-------------------------------------------------------------|:-----------------|:-------------------------------|:------------------------------|
| text | prebuilt.text.sglang.GEMMA_2_9B_IT | https://huggingface.co/google/gemma-2-9b-it | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.sglang.META_LLAMA_3_1_8B_INSTRUCT_CONFIG | https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.NUEXTRACT | https://huggingface.co/numind/NuExtract | Default | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| text | prebuilt.text.vllm.NUEXTRACT_TINY | https://huggingface.co/numind/NuExtract-tiny | Default | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| text | prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_64K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_128K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| text | prebuilt.text.vllm.COHERE_FOR_AYA_23_35B | https://huggingface.co/CohereForAI/aya-23-35B | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.sglang.LLAVA_NEXT_LLAMA3_8B | https://huggingface.co/lmms-lab/llama3-llava-next-8b | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.sglang.LLAVA_NEXT_QWEN_1_5_72B_CONFIG | https://huggingface.co/lmms-lab/llama3-llava-next-8b | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| vision | prebuilt.vision.sglang.LLAVA_ONEVISION_QWEN_2_7B_CONFIG | https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.sglang.LLAVA_ONEVISION_QWEN_2_72B_CONFIG | https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_4K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 4096 | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_8K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 8192 | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_12K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 12000 | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_32K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 32000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_64K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_128K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| vision | prebuilt.vision.vllm.QWEN2_VL_2B_INSTRUCT | https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| vision | prebuilt.vision.vllm.QWEN2_VL_7B_INSTRUCT | https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| audio | prebuilt.audio.vllm.FIXIE_ULTRA_VOX_0_4_64K_CONFIG | https://huggingface.co/fixie-ai/ultravox-v0_4 | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| audio | prebuilt.audio.vllm.FIXIE_ULTRA_VOX_0_4_128K_CONFIG | https://huggingface.co/fixie-ai/ultravox-v0_4 | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| model_type | cfg_path | huggingface_link | context_length | min_azure_ep_type_gpu | min_aws_ep_type_gpu |
|:-------------|:---------------------------------------------------------|:-------------------------------------------------------------|:-----------------|:-------------------------------|:-------------------------------|
| text | prebuilt.text.sglang.GEMMA_2_9B_IT | https://huggingface.co/google/gemma-2-9b-it | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.sglang.META_LLAMA_3_1_8B_INSTRUCT_CONFIG | https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.NUEXTRACT | https://huggingface.co/numind/NuExtract | Default | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| text | prebuilt.text.vllm.NUEXTRACT_TINY | https://huggingface.co/numind/NuExtract-tiny | Default | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| text | prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_64K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_128K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| text | prebuilt.text.vllm.COHERE_FOR_AYA_23_35B | https://huggingface.co/CohereForAI/aya-23-35B | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.QWEN2_5_7B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-7B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.QWEN2_5_14B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-14B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.QWEN2_5_32B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-32B-Instruct | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| text | prebuilt.text.vllm.QWEN2_5_72B_8K_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-72B-Instruct | 8192 | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| text | prebuilt.text.vllm.QWEN2_5_72B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-72B-Instruct | Default | GPU_LARGE_8 [A100_80Gx8 640GB] | GPU_LARGE_8 [A100_80Gx8 640GB] |
| vision | prebuilt.vision.sglang.LLAVA_NEXT_LLAMA3_8B | https://huggingface.co/lmms-lab/llama3-llava-next-8b | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.sglang.LLAVA_NEXT_QWEN_1_5_72B_CONFIG | https://huggingface.co/lmms-lab/llama3-llava-next-8b | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| vision | prebuilt.vision.sglang.LLAVA_ONEVISION_QWEN_2_7B_CONFIG | https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.sglang.LLAVA_ONEVISION_QWEN_2_72B_CONFIG | https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_4K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 4096 | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_8K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 8192 | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_12K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 12000 | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_32K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 32000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_64K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_128K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| vision | prebuilt.vision.vllm.QWEN2_VL_2B_INSTRUCT | https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| vision | prebuilt.vision.vllm.QWEN2_VL_7B_INSTRUCT | https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.vllm.PIXTRAL_12B_32K_INSTRUCT | https://huggingface.co/mistralai/Pixtral-12B-2409 | 32768 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.vllm.PIXTRAL_12B_64K_INSTRUCT | https://huggingface.co/mistralai/Pixtral-12B-2409 | 65536 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| vision | prebuilt.vision.vllm.PIXTRAL_12B_128K_INSTRUCT | https://huggingface.co/mistralai/Pixtral-12B-2409 | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| audio | prebuilt.audio.vllm.FIXIE_ULTRA_VOX_0_4_64K_CONFIG | https://huggingface.co/fixie-ai/ultravox-v0_4 | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| audio | prebuilt.audio.vllm.FIXIE_ULTRA_VOX_0_4_128K_CONFIG | https://huggingface.co/fixie-ai/ultravox-v0_4 | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |

### Deploying a model using EZ Deploy

Expand Down
23 changes: 22 additions & 1 deletion docs/getting-started/ezdeploylite.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,9 @@ model.invoke("what color is the sky?")

## Registering into Mosaic AI Gateway


### Requirements

To register into Mosaic AI Gateway you need the following things:

1. Base URL of the deployment
Expand All @@ -120,4 +123,22 @@ deployment_name = "my_qwen_model"
base_url = get_ezdeploy_lite_openai_url(deployment_name)
```

To retrieve the token its basically the databricks token of the user who deployed the model.
To retrieve the token its basically the databricks token of the user who deployed the model.

The following steps will show you what it looks like in the Databricks UI

### Setting up a new Mosaic AI Gateway Endpoint

![create_serving_endpoint_1.png](../static/create_serving_endpoint_1.png)

### Setting up the external OpenAI endpoint

![create_serving_endpoint_2.png](../static/create_serving_endpoint_2.png)

### Configure the settings

1. Make sure you set the OpenAI API Base (look at the [requirements](#requirements_1) for the base url)
2. Make sure you set the external model name to default (you can just type in the input)
3. Ensure that you set the OpenAI API key secret to the databricks token

![create_serving_endpoint_3.png](../static/create_serving_endpoint_3.png)
Binary file added docs/static/create_serving_endpoint_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/static/create_serving_endpoint_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/static/create_serving_endpoint_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 14 additions & 9 deletions docs/supported-models/text-models.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
# Text Models

| Config | Huggingface Link | context_length | min_azure_ep_type_gpu | min_aws_ep_type_gpu |
|:-------------------------------------------------------|:-------------------------------------------------------------|:---------------|:-------------------------------|:------------------------------|
| prebuilt.text.sglang.GEMMA_2_9B_IT | https://huggingface.co/google/gemma-2-9b-it | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.sglang.META_LLAMA_3_1_8B_INSTRUCT_CONFIG | https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.NUEXTRACT | https://huggingface.co/numind/NuExtract | Default | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| prebuilt.text.vllm.NUEXTRACT_TINY | https://huggingface.co/numind/NuExtract-tiny | Default | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_64K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_128K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| prebuilt.text.vllm.COHERE_FOR_AYA_23_35B | https://huggingface.co/CohereForAI/aya-23-35B | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| Config | Huggingface Link | context_length | min_azure_ep_type_gpu | min_aws_ep_type_gpu |
|:-------------------------------------------------------|:-------------------------------------------------------------|:---------------|:-------------------------------|:-------------------------------|
| prebuilt.text.sglang.GEMMA_2_9B_IT | https://huggingface.co/google/gemma-2-9b-it | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.sglang.META_LLAMA_3_1_8B_INSTRUCT_CONFIG | https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.NUEXTRACT | https://huggingface.co/numind/NuExtract | Default | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| prebuilt.text.vllm.NUEXTRACT_TINY | https://huggingface.co/numind/NuExtract-tiny | Default | GPU_SMALL [T4x1 16GB] | GPU_SMALL [T4x1 16GB] |
| prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_64K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.NOUS_HERMES_3_LLAMA_3_1_8B_128K | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| prebuilt.text.vllm.COHERE_FOR_AYA_23_35B | https://huggingface.co/CohereForAI/aya-23-35B | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.QWEN2_5_7B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-7B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.QWEN2_5_14B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-14B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.QWEN2_5_32B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-32B-Instruct | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.text.vllm.QWEN2_5_72B_8K_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-72B-Instruct | 8192 | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| prebuilt.text.vllm.QWEN2_5_72B_INSTRUCT | https://huggingface.co/Qwen/Qwen2.5-72B-Instruct | Default | GPU_LARGE_8 [A100_80Gx8 640GB] | GPU_LARGE_8 [A100_80Gx8 640GB] |
5 changes: 4 additions & 1 deletion docs/supported-models/vision-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,7 @@
| prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_64K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | 64000 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.vision.vllm.PHI_3_5_VISION_INSTRUCT_128K | https://huggingface.co/microsoft/Phi-3.5-vision-instruct | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
| prebuilt.vision.vllm.QWEN2_VL_2B_INSTRUCT | https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | GPU_MEDIUM [A10Gx1 24GB] |
| prebuilt.vision.vllm.QWEN2_VL_7B_INSTRUCT | https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.vision.vllm.QWEN2_VL_7B_INSTRUCT | https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct | Default | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.vision.vllm.PIXTRAL_12B_32K_INSTRUCT | https://huggingface.co/mistralai/Pixtral-12B-2409 | 32768 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.vision.vllm.PIXTRAL_12B_64K_INSTRUCT | https://huggingface.co/mistralai/Pixtral-12B-2409 | 65536 | GPU_LARGE [A100_80Gx1 80GB] | MULTIGPU_MEDIUM [A10Gx4 96GB] |
| prebuilt.vision.vllm.PIXTRAL_12B_128K_INSTRUCT | https://huggingface.co/mistralai/Pixtral-12B-2409 | Default | GPU_LARGE_2 [A100_80Gx2 160GB] | GPU_MEDIUM_8 [A10Gx8 192GB] |
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,15 @@

# COMMAND ----------

url = f'https://{spark.conf.get("spark.databricks.workspaceUrl")}'
url = url.rstrip("/")
cluster_id = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
print(
"Base URL for this endpoint:", f"{url}/driver-proxy-api/o/0/{cluster_id}/9989/v1/"
)

# COMMAND ----------

import time

while True:
Expand Down
Loading

0 comments on commit 56ea627

Please sign in to comment.