Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama Support #14

Open
0xrsydn opened this issue Jun 18, 2024 · 11 comments
Open

Ollama Support #14

0xrsydn opened this issue Jun 18, 2024 · 11 comments

Comments

@0xrsydn
Copy link

0xrsydn commented Jun 18, 2024

is it possible to use llama3 via ollama rather than huggingface one?

@jehna
Copy link
Owner

jehna commented Jun 19, 2024

Not possible at the moment, but should be straightforward to implement if you'd like to give it a shot

You can check LlamaCpp docs from Guidance and change (preferably parametrize) the config:
https://github.com/jehna/humanify/blob/main/local-inference/guidance_config.py

@0xdevalias
Copy link

0xdevalias commented Jun 19, 2024

I wonder if it's worth implementing a wrapper/abstraction layer like LiteLLM to make things more flexible?

  • https://github.com/BerriAI/litellm
    • Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

This is what projects like aider use:

  • https://aider.chat/docs/llms/other.html
    • Aider uses the litellm package to connect to hundreds of other models. You can use aider --model <model-name> to use any supported model.

      To explore the list of supported models you can run aider --models <model-name> with a partial model name. If the supplied name is not an exact match for a known model, aider will return a list of possible matching models.

Though I'm not currently sure if/how compatible that is with the guidance module you're currently using:

See also:

@0xdevalias
Copy link

@jehna Curious, what aspects of guidance does humanify currently rely on? Is it using much of the deeper 'controls' provided by it?

Skimming the following prompt files:

It looks like gen, stop and stop_regex are used:

@jehna
Copy link
Owner

jehna commented Aug 12, 2024

There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.

@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version

@0xrsydn
Copy link
Author

0xrsydn commented Aug 13, 2024

There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.

@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version

i think the recent one (llama3.1 8b) is great. Thanks btw

@jehna
Copy link
Owner

jehna commented Aug 14, 2024

I researched a bit about Ollama. If I'm correct, you could run Ollama locally and Humanify could connect to its API to use any model that Ollama uses.

There seems to be an undocumented feature that allows passing GBNF grammars as an argument to the model:
ollama/ollama#3616 (comment)

...but judging from other open issues about the topic I'm not really sure if it works or not. But I'll give it a try!

@0xdevalias
Copy link

0xdevalias commented Aug 15, 2024

...but judging from other open issues about the topic I'm not really sure if it works or not

This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:


Based on my read of these:

There are a few open PRs for this behaviour - the most recent one being ollama/ollama#3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.

Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!

Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.

According to this:

My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.)

Originally posted by @Kinglord in ollama/ollama#6237 (comment)

It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.

@Kinglord
Copy link

...but judging from other open issues about the topic I'm not really sure if it works or not

This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:

* [Ollama Product Stance on Grammar Feature / Outstanding PRs ollama/ollama#6237](https://github.com/ollama/ollama/issues/6237)

Based on my read of these:

There are a few open PRs for this behaviour - the most recent one being ollama/ollama#3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.
Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!
Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.

According to this:

My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.)
Originally posted by @Kinglord in ollama/ollama#6237 (comment)

It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.

Sadly @0xdevalias is correct and what you want to do @jehna will not work unless you patch your version of Ollama with the PR that was linked above, the release version still has no support for GBNF outside of the built in json mode. Ollama still refuses to even reply to this issue for some really strange reason, I still have no idea why they won't talk about it all and simply let the PRs keep rolling in and sit there. At this point all we can do is keep pressuring them by raising issues, making noise both here and on the Discord until we can get someone to take 10 minutes and explain to us this decision to essentially completely block this feature from end users in Ollama.

@jehna
Copy link
Owner

jehna commented Aug 15, 2024

Thank you for looking into this. I just pushed ollama-support branch that should start working if they start supporting the grammar flag

@jehna jehna mentioned this issue Aug 15, 2024
@jehna
Copy link
Owner

jehna commented Aug 15, 2024

☝️ added llama3.1 8b model support

@dangelo352
Copy link

How do we use ollama with this sorry if this is a dumb question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants