Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Ollama support #1036

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

microdev1
Copy link

Adds a thin wrapper for models pulled using Ollama. Gets the local model path using the provided model name, and then instantiates the LlamaCppEngine with it and other args.

@microdev1
Copy link
Author

from guidance import models

ollama = models.Ollama('phi3.5')
ollama = models.Ollama('phi3.5:3.8b')
ollama = models.Ollama('phi3.5:latest')

...

@nking-1
Copy link
Collaborator

nking-1 commented Sep 30, 2024

Hi @microdev1, this is a great start on Ollama support in Guidance. Thanks for your contribution!

I tested the Ollama class and was able to run a model using it, so I can confirm the basic functionality is working. However, there is some additional work needed regarding chat templates. Without the proper template, Guidance walks off the end of a role and continues generating text beyond the <|end|> token.

Ollama stores the chat template in the modelfile, and it looks like this for phi 3:

TEMPLATE "{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>"
PARAMETER stop <|end|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>

In contrast to Ollama, Guidance uses a Jinja style template like this:

phi3_small_template = "{{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + '\n' + message['content'] + '<|end|>\n' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>\n' }}{% else %}{{ eos_token }}{% endif %}"

Guidance also has classes wrapping the templates like this:

class Phi3MiniChatTemplate(ChatTemplate):
    # available_roles = ["user", "assistant"]
    template_str = phi3_mini_template

    def get_role_start(self, role_name):
        if role_name == "user":
            return "<|user|>\n"
        elif role_name == "assistant":
            return "<|assistant|>\n"
        elif role_name == "system":
            return "<|system|>\n"
        else:
            raise UnsupportedRoleException(role_name, self)

    def get_role_end(self, role_name=None):
        return "<|end|>\n"

Ideally when Ollama model is loaded, the proper chat template would automatically be loaded as well. If you're curious, the chat templates code is in guidance/chat.py We're still discussing how to improve it to support Ollama and make it easier for the community to add templates for open source models. You're welcome to make any suggestions or take a shot at implementing something for chat templates with Ollama.

All that being said, I think your implementation would technically work as long as someone provides the appropriate chat template string with the chat_template parameter in the constructor. We should be able to use this as a starting point for the next steps.

@codecov-commenter
Copy link

codecov-commenter commented Oct 8, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 50.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 61.43%. Comparing base (6eb08f4) to head (24bcf91).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
guidance/models/_ollama.py 47.61% 11 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

❗ There is a different number of reports uploaded between BASE (6eb08f4) and HEAD (24bcf91). Click for more details.

HEAD has 56 uploads less than BASE
Flag BASE (6eb08f4) HEAD (24bcf91)
124 68
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1036      +/-   ##
==========================================
- Coverage   70.25%   61.43%   -8.83%     
==========================================
  Files          62       63       +1     
  Lines        4472     4494      +22     
==========================================
- Hits         3142     2761     -381     
- Misses       1330     1733     +403     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xruifan xruifan mentioned this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ollama support?
3 participants