Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PaLM models, such as chat-bison and text-bison #370

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ivano-donadi-ennova
Copy link

Hi,

I added PaLM models to the available LLMs in guidance. A quick overview can be found in the dedicated notebook in the llm folder.

New features:

  • "context" alias for the system role
  • "example" role containing input and output blocks to provide demonstration examples, including notebook formatting of examples
  • support for library (through VertexAI SDK) and rest calls to google's text genertion and chat models with or without streaming

Issues:

  • the select tool requires tokenization of the prompt. However, to the best of my knowledge, there is not an equivalent to tiktoken providing google's models tokenizers. This could still be done by a billable rest call to their tokenization API, but I would like to get community feedback to decide whether to implement this feature or if there are other workarounds.

Tests:

  • I tried to replicate all tests done on OpenAI models on PaLM, with the exception of the select tool.

Please let me know what you think!

@ivano-donadi-ennova ivano-donadi-ennova changed the title Add support for PaLM models, such as chat-bison and code-bison Add support for PaLM models, such as chat-bison and text-bison Sep 7, 2023
@xnohat
Copy link

xnohat commented Sep 18, 2023

Hi,

Google doesn't provide any Tokenizer open-source library, so I think you should change self._tokenizer to a restful api wrapper
original copied from _openai.py

        import tiktoken
        if encoding_name is None:
            encoding_name = tiktoken.encoding_for_model(model).name
        self._tokenizer = tiktoken.get_encoding(encoding_name)

@ivano-donadi-ennova
Copy link
Author

Hi,
unfortunately, google's text embedding apis do not seem to include a decoding option, which is required in the 'select' pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants