Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This demonstrates tool use (function calling), which is now supported in my PR to Swift Jinja. You should make sure that you have the latest
tokenizer_config.json
file for each model, since in some cases function calling was added in a recent update.The following are some examples of responses to the prompt
What's the weather in Paris today?
in LLMEval. Aget_current_weather
function is provided to the model in the prompt constructed with the chat template.Llama 3.1 8B
Llama 3.2 3B
Qwen 2.5 7B
Mistral 7B
Performance
Llama 3.1 8B and 3.2 3B with the current chat templates from
tokenizer_config.json
tend to always respond with a function call, even when not appropriate. My proposed change to the chat template helps, but the models still sometimes respond with calls to non-existent functions. In general, the prompts provided in the Llama chat templates are far from optimal, and I think the models' performance could be further improved simply by using better prompts (for example, "Knowledge cutoff date" instead of "Cutting Knowledge Date").Qwen 2.5 7B and Mistral 7B do a better job of calling functions only when appropriate.
Handling the function call
If I understand correctly, the app would need to parse the JSON function call, stop generating after the function call, call the function with the parameters, add a message with the function's output, and generate a user-facing response with the messages.