-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeepHermes to list of supported models #229
base: main
Are you sure you want to change the base?
Conversation
fwiw the model's native tool calling ability is poor since it's not based on the other hermes 3 / llama 3 models but on unsloth AI's llama model, so function calling is not baked into the tokenizer. |
…that postgres is not wiped; update docs
FYI, also added:
When stopping the currently-running nodes at |
@K-Mistele @richardblythman - all looks good. Just want to check on this, we are now setting docker-compose to pull node image from the hub instead of building it locally. Quick note about the deploy sequence: Since we have separate GitHub Actions for DockerHub push and deployment, we need to make sure the deploy action only runs after the DockerHub push is complete. Otherwise we might try to deploy before the new image is available on DockerHub. @richardblythman - Also curious if there were specific reasons you were building Docker images locally in the compose file instead of pulling from DockerHub? Want to make sure we're not missing any important use cases. |
DeepHermes (NousResearch/DeepHermes-3-Llama-3-8B-Preview) is now a supported model and can be specified with
VLLM_MODELS
in.env
.I tested it using the second cluster where it is now being served. It is important to note that vLLM does not automatically extract reasoning traces into
choices[0].message.reasoning_content
. This can be enabled with a CLI flag; however currently it only works on deepseek due to the model's tokenizer and the assumption that the reasoning parser makes that each of<think>
and</think>
will be a single token in the model's tokenizer. However while I was testing it, I found thatchoices[0].message.reasoning_content
was being automatically set, and the reasoning trace generated by DeepHermes was still extracted into it despite the aforementioned constraints.After a little more debugging, it appears that LiteLLM is automatically extracting the reasoning trace based on the presence of
<think>
and</think>
into thereasoning_content
field; it is unclear if this behavior can be disabled.As a result, it is not necessary to use the vLLM reasoning parser.
Example request:
example response: