The following two files showcase how to run inference on models (default: Unsloth Llama-3.2-3B Instruct) using the @metaflow_ray
decorator with @kubernetes
.
-
flow.py
showcases the usage of@huggingface_hub
to download the model along with using@model
to ensure all workers have the model artifact loaded on a pre-defined path. The messages to be sent are a parameter to the flow andvllm
is used for inference. -
main.py
runsflow.py
on the Outerbounds platform with fast-bakery using the Runner API. This makes it easy to pass JSON based parameters without using the CLI.
- The flow can be run using
python main.py
or perhaps python flow.py --no-pylint --environment=fast-bakery run
The model can be changed using the --model_id
parameter (eg: --model_id "mistralai/Mistral-7B-Instruct-v0.1"
).
Please make sure to have right credentials for pulling from HuggingFace.