Ollama Support #14

0xrsydn · 2024-06-18T03:01:47Z

is it possible to use llama3 via ollama rather than huggingface one?

jehna · 2024-06-19T16:05:20Z

Not possible at the moment, but should be straightforward to implement if you'd like to give it a shot

You can check LlamaCpp docs from Guidance and change (preferably parametrize) the config:
https://github.com/jehna/humanify/blob/main/local-inference/guidance_config.py

0xdevalias · 2024-06-19T23:30:46Z

I wonder if it's worth implementing a wrapper/abstraction layer like LiteLLM to make things more flexible?

https://github.com/BerriAI/litellm
- Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

This is what projects like aider use:

https://aider.chat/docs/llms/other.html
- Aider uses the litellm package to connect to hundreds of other models. You can use aider --model <model-name> to use any supported model.
  
  To explore the list of supported models you can run aider --models <model-name> with a partial model name. If the supplied name is not an exact match for a known model, aider will return a list of possible matching models.

Though I'm not currently sure if/how compatible that is with the guidance module you're currently using:

https://github.com/guidance-ai/guidance
- A guidance language for controlling large language models.
- https://github.com/guidance-ai/guidance#loading-models
  - Loading models

See also:

Ollama support guidance-ai/guidance#599
Integrate Ollama guidance-ai/guidance#687
LiteLLM connection issue when deployed on another server guidance-ai/guidance#648
How to connect to the custom API large model ？ guidance-ai/guidance#815
- If you look here you can find that it can use LiteLLM, which itself can work with Ollama or some kind popular of remote LLM service
  - https://github.com/guidance-ai/guidance/blob/main/guidance/models/_lite_llm.py

0xdevalias · 2024-06-20T00:18:13Z

@jehna Curious, what aspects of guidance does humanify currently rely on? Is it using much of the deeper 'controls' provided by it?

Skimming the following prompt files:

It looks like gen, stop and stop_regex are used:

jehna · 2024-08-12T19:49:22Z

There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.

@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version

0xrsydn · 2024-08-13T01:58:18Z

There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.

@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version

i think the recent one (llama3.1 8b) is great. Thanks btw

jehna · 2024-08-14T14:13:02Z

I researched a bit about Ollama. If I'm correct, you could run Ollama locally and Humanify could connect to its API to use any model that Ollama uses.

There seems to be an undocumented feature that allows passing GBNF grammars as an argument to the model:
ollama/ollama#3616 (comment)

...but judging from other open issues about the topic I'm not really sure if it works or not. But I'll give it a try!

0xdevalias · 2024-08-15T03:00:09Z

...but judging from other open issues about the topic I'm not really sure if it works or not

This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:

Ollama Product Stance on Grammar Feature / Outstanding PRs ollama/ollama#6237

Based on my read of these:

There are a few open PRs for this behaviour - the most recent one being ollama/ollama#3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.

Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!

Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.

According to this:

My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.)

Originally posted by @Kinglord in ollama/ollama#6237 (comment)

It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.

Kinglord · 2024-08-15T06:09:42Z

...but judging from other open issues about the topic I'm not really sure if it works or not

This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:
* [Ollama Product Stance on Grammar Feature / Outstanding PRs ollama/ollama#6237](https://github.com/ollama/ollama/issues/6237)
Based on my read of these:

There are a few open PRs for this behaviour - the most recent one being ollama/ollama#3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.
Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!
Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.

According to this:

My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.)
Originally posted by @Kinglord in ollama/ollama#6237 (comment)

It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.

Sadly @0xdevalias is correct and what you want to do @jehna will not work unless you patch your version of Ollama with the PR that was linked above, the release version still has no support for GBNF outside of the built in json mode. Ollama still refuses to even reply to this issue for some really strange reason, I still have no idea why they won't talk about it all and simply let the PRs keep rolling in and sit there. At this point all we can do is keep pressuring them by raising issues, making noise both here and on the Discord until we can get someone to take 10 minutes and explain to us this decision to essentially completely block this feature from end users in Ollama.

jehna · 2024-08-15T16:41:11Z

Thank you for looking into this. I just pushed ollama-support branch that should start working if they start supporting the grammar flag

jehna · 2024-08-15T20:26:03Z

☝️ added llama3.1 8b model support

dangelo352 · 2024-09-29T09:26:54Z

How do we use ollama with this sorry if this is a dumb question.

jehna mentioned this issue Aug 15, 2024

Add 8b model #39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama Support #14

Ollama Support #14

0xrsydn commented Jun 18, 2024

jehna commented Jun 19, 2024

0xdevalias commented Jun 19, 2024 •

edited

Loading

0xdevalias commented Jun 20, 2024

jehna commented Aug 12, 2024

0xrsydn commented Aug 13, 2024

jehna commented Aug 14, 2024

0xdevalias commented Aug 15, 2024 •

edited

Loading

Kinglord commented Aug 15, 2024

jehna commented Aug 15, 2024

jehna commented Aug 15, 2024

dangelo352 commented Sep 29, 2024

Ollama Support #14

Ollama Support #14

Comments

0xrsydn commented Jun 18, 2024

jehna commented Jun 19, 2024

0xdevalias commented Jun 19, 2024 • edited Loading

0xdevalias commented Jun 20, 2024

jehna commented Aug 12, 2024

0xrsydn commented Aug 13, 2024

jehna commented Aug 14, 2024

0xdevalias commented Aug 15, 2024 • edited Loading

Kinglord commented Aug 15, 2024

jehna commented Aug 15, 2024

jehna commented Aug 15, 2024

dangelo352 commented Sep 29, 2024

0xdevalias commented Jun 19, 2024 •

edited

Loading

0xdevalias commented Aug 15, 2024 •

edited

Loading