Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Instruct/Chat versions of models & introduce a new ChatTemplate API, fix Anthropic API #820

Merged
merged 34 commits into from
May 15, 2024

Conversation

Harsha-Nori
Copy link
Collaborator

@Harsha-Nori Harsha-Nori commented May 13, 2024

This PR should significantly reduce the number of user-facing classes we have in Guidance, and reduce subtle bugs introduced by using a mis-specified Chat Template (models currently silently default to the ChatML syntax, which many of the latest models don't adhere to). It should also make it easier for users to add new models to guidance, either via PR or in their own codebases.

Before:

from guidance.models.transformers import Transformers, TransformersChat

class Llama(Transformers):
    pass

# Users have to do this for most chat models in guidance
class LlamaChat(TransformersChat, Llama):
    def get_role_start(self, role_name, **kwargs):
        if role_name == "system":
            return self._system_prefex + "<<SYS>>\n"
        elif role_name == "user":
            if str(self).endswith("\n<</SYS>>\n\n"):
                return ""  
            else:
                return "[INST] "
        else:
            return " "

    def get_role_end(self, role_name=None):
        if role_name == "system":
            return "\n<</SYS>>\n\n"
        elif role_name == "user":
            return " [/INST]"
        else:
            return " "

lm = LlamaChat(path_to_llama)

After:

from guidance.models import Transformers

lm = Transformers(path_to_llama) # automagically works

If you're using a rare model and the auto import doesn't automatically work...

After pt2:

# users can copy paste from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer_config.json#L12 
llama2_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' '  + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}"

lm = Transformers(path_to_llama, chat_template=llama2_template)

or, in the worst case for maximal robustness and customizability:

from guidance._chat import ChatTemplate, UnsupportedRoleException

class Llama2ChatTemplate(ChatTemplate):
    template_str = llama2_template

    def get_role_start(self, role_name):
        if role_name == "system":
            return "[INST] <<SYS>>\n"
        elif role_name == "user":
            return "<s>[INST]"
        elif role_name == "assistant":
            return " "
        else:
            raise UnsupportedRoleException(role_name, self)
        
    def get_role_end(self, role_name=None):
        if role_name == "system":
            return "\n<</SYS>"
        elif role_name == "user":
            return " [/INST]"
        elif role_name == "assistant":
            return "</s>"
        else:
            raise UnsupportedRoleException(role_name, self)

lm = Transformers(path_to_llama, chat_template=Llama2ChatTemplate)

The first big change is the removal of the Chat and Instruct mixins, and an introduction of a new guidance._chat.ChatTemplate class, which handles the same responsibilities as those mixins used to.

Users can construct a subclass of ChatTemplate and pass it to models with a new chat_template argument (defaulted to None).

The way this works for local models is to leverage the chat_template property in huggingface transformers and llamacpp's GGUF files. When a user tries to load a model, guidance now follows the following order of operations:

  1. See if the user passed in a ChatTemplate -- if so, use that directly.
  2. If the user passed string "chat_template", set that as a "template_str". If the user did not pass in anything, set the "template_str" based on metadata from the huggingface.AutoTokenizer or gguf metadata fields.
  3. Check the template_str against a local cache in guidance which maintains template converters for the most popular models on huggingface/in llama.cpp. We index this cache based on the actual chat_template string, so any model that uses one of these chat templates -- even if it isn't explicitly listed in the documentation -- will automatically load the right guidance class.
  4. If we don't have anything in the cache, try to automatically convert the jinja2 template into the new guidance._chat.ChatTemplate syntax. Warn the user if we attempt this. [NOTE: This is not yet implemented and may come in a future PR.]
  5. Default to the ChatML syntax, with a warning to the user.

Currently this PR updates the following user facing guidance.Models classes:

  • Transformers (removing TransformersChat and the Llama/Mistral subclasses)
  • LlamaCpp (removing LlamaCppChat and Mistral subclasses)
  • Anthropic
  • OpenAI

For now, Anthropic should be representative of how grammarless classes will work. I wanted to start with OpenAI, but many other guidance.models classes inherit from OpenAI, so I'm saving that for later. Also while I was at it I upgraded the Anthropic class to use their latest SDK, so guidance.models.Anthropic should now work with the latest Claude3 models.

TODO

A decent amount left to do here. In no particular order...

  1. Write out more templates for popular models in the ChatTemplateCache, and also add an alias system so that we can look up models in the cache by common names (e.g. "llama3").
  2. Add a deprecation warning to people trying to use very old models on Anthropic.
  3. Much more testing and documentation. We should, for example, add documentation on how to import/initialize a new ChatTemplate and use it for your own models.
  4. Write the auto-converter from huggingface jinja2 to guidance ChatTemplate. A battery of unit tests here that compare against the original transformers.apply_chat_template method would make this more robust. Can be in a future PR as this is complex logic. A start to this was attempted in Add TransformersChat code to figure out correct role start and end tokens #791 by @ilmarinen, and we could eventually pull this in and expand its coverage.
  5. Probably get rid of the folders in guidance.models.llama_cpp and guidance.models.transformers because we don't need to maintain a bunch of subclasses for them anymore.

Would appreciate any and all feedback, particularly on the logical flow and new user facing (simpler) API. @marcotcr @paulbkoch @slundberg @riedgar-ms @hudson-ai

Comment on lines -385 to +387
for (
byte
) in (
node.keys()
): # we update all the children since the parser knows the full mask

# we update all the children since the parser knows the full mask
for byte in node.keys():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these changes are just undoing particularly egregious dissections that black did...e.g. it's wild that this loop iterator was split into 5 lines!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realizing now that it makes this particular PR much harder to parse though, apologies for that!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that that was done by the long trailing comment. That seems to be something that black is not overly keen on

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you just shift the comment and re-run black ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I ended up manually moving trailing comments up a line, but it'd be good to see if there's an automatic way to make black handle it.

Comment on lines +111 to +133
class Llama2ChatTemplate(ChatTemplate):
# available_roles = ["system", "user", "assistant"]
template_str = llama2_template

def get_role_start(self, role_name):
if role_name == "system":
return "[INST] <<SYS>>\n"
elif role_name == "user":
return "<s>[INST]"
elif role_name == "assistant":
return " "
else:
raise UnsupportedRoleException(role_name, self)

def get_role_end(self, role_name=None):
if role_name == "system":
return "\n<</SYS>"
elif role_name == "user":
return " [/INST]"
elif role_name == "assistant":
return "</s>"
else:
raise UnsupportedRoleException(role_name, self)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this template is likely an oversimplification of the hf template string, I'll need to debug more and extend this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add the tests you develop during your debugging.

@Harsha-Nori
Copy link
Collaborator Author

Harsha-Nori commented May 13, 2024

Hmm, test failures are related to needing to authenticate on huggingface to load some model tokenizers. I added these new tests in to check the ChatTemplateCache. We could disable these tests, or (better) figure out a way to pass a huggingface auth token here....

E Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
E Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

@riedgar-ms thoughts here?

self._cache = {}

def __getitem__(self, key):
key_compact = key.replace(" ", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: will this collapse multiple consecutive spaces?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah it's a good catch, I'm actually not sure we need to be doing this. I didn't want minor differences in jinja formats (which I believe are whitespace agnostic for parts of them) to cause different mappings in the cache, but maybe there are places where an extra space is actually a meaningful difference?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a question I asked before I caught sight of the actual keys you were using....

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the changes to Anthropic, does that mean we now have access to API keys which we can use in the builds?

guidance/_chat.py Show resolved Hide resolved
guidance/models/_anthropic.py Show resolved Hide resolved
guidance/models/_anthropic.py Show resolved Hide resolved
Comment on lines -385 to +387
for (
byte
) in (
node.keys()
): # we update all the children since the parser knows the full mask

# we update all the children since the parser knows the full mask
for byte in node.keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you just shift the comment and re-run black ?

tests/models/test_chat_templates.py Outdated Show resolved Hide resolved
@Harsha-Nori
Copy link
Collaborator Author

Given the changes to Anthropic, does that mean we now have access to API keys which we can use in the builds?

I just made a personal account and paid for $20 of credits out of pocket to test it...not sure it's wise to depend on that for our CI/CD (especially given how frequently they run :| ).

@riedgar-ms
Copy link
Collaborator

Given the changes to Anthropic, does that mean we now have access to API keys which we can use in the builds?

I just made a personal account and paid for $20 of credits out of pocket to test it...not sure it's wise to depend on that for our CI/CD (especially given how frequently they run :| ).

I'm quite happy to spend your money.....

@riedgar-ms
Copy link
Collaborator

Hmm, test failures are related to needing to authenticate on huggingface to load some model tokenizers. I added these new tests in to check the ChatTemplateCache. We could disable these tests, or (better) figure out a way to pass a huggingface auth token here....

E Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
E Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

@riedgar-ms thoughts here?

You can add the HF auth token to the repo as a secret (like we did for the AzureAI studio secrets), and then access it as an environment variable within the test. I believe I have an env_or_fail() method to help with that.

@Harsha-Nori
Copy link
Collaborator Author

Hmm, test failures are related to needing to authenticate on huggingface to load some model tokenizers. I added these new tests in to check the ChatTemplateCache. We could disable these tests, or (better) figure out a way to pass a huggingface auth token here....

E Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
E Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

@riedgar-ms thoughts here?

You can add the HF auth token to the repo as a secret (like we did for the AzureAI studio secrets), and then access it as an environment variable within the test. I believe I have an env_or_fail() method to help with that.

Nice thanks, I changed it to use the env variable and your helper function

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have suggested a way of getting some of those Phi3 tests running in the gates

guidance/_chat.py Show resolved Hide resolved

lm += lm.get_role_start(role_name, **kwargs)

# TODO [HN]: Temporary change while I instrument chat_template in transformers only.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case (and below)?


tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, token=hf_token)
model_chat_template = tokenizer.chat_template
if should_pass:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would separating things out and using pytest's XFAIL be better here?

api_key="blah",
)
assert isinstance(initialized_model, model_class)
# This is all redundant with the class unification
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better deleted than commented out

# directly passing in newlines next to special tokens for a tokenizer that does rstrip on those tokens
# (like phi-3) will cause a tokenization mismatch issue.
# We're leaving this test in so that we can reliably reproduce and debug this in the future.
with pytest.raises(Exception):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you assert something about the exception?

assert "5" in lm["five"]


@pytest.mark.skip("Don't overload the build machines")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting nervous about the number of tests added which are also skipped. Can you do something like:

@pytest.fixture(scope="module")
def llamacpp_model(selected_model, selected_model_name):
if selected_model_name in ["hfllama7b", "hfllama_7b_gpu"]:
return selected_model
else:
pytest.skip("Requires Llama-Cpp model")

So that these tests will run whenever there's a Phi3 model active in selected_model ?

@codecov-commenter
Copy link

codecov-commenter commented May 14, 2024

Codecov Report

Attention: Patch coverage is 59.49367% with 96 lines in your changes are missing coverage. Please review.

Project coverage is 57.34%. Comparing base (13270bf) to head (57d1c2b).

❗ Current head 57d1c2b differs from pull request most recent head d6c0b38. Consider uploading reports for the commit d6c0b38 to get more accurate results

Files Patch % Lines
guidance/_chat.py 63.06% 41 Missing ⚠️
guidance/models/_anthropic.py 0.00% 30 Missing ⚠️
guidance/models/transformers/_transformers.py 21.42% 11 Missing ⚠️
guidance/models/_openai.py 70.00% 6 Missing ⚠️
guidance/models/_model.py 82.14% 5 Missing ⚠️
guidance/library/_role.py 88.88% 1 Missing ⚠️
guidance/models/_grammarless.py 87.50% 1 Missing ⚠️
guidance/models/_remote.py 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #820      +/-   ##
==========================================
+ Coverage   57.02%   57.34%   +0.32%     
==========================================
  Files          57       56       -1     
  Lines        4193     4194       +1     
==========================================
+ Hits         2391     2405      +14     
+ Misses       1802     1789      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@riedgar-ms
Copy link
Collaborator

@paulbkoch I think that everything is going to pass eventually on this. Per the discussion thread, I'll have to see about converting some more models over to the new paradigm

@paulbkoch paulbkoch merged commit a75896a into main May 15, 2024
126 checks passed
@Harsha-Nori Harsha-Nori changed the title [WIP] Remove Instruct/Chat versions of models & introduce a new ChatTemplate API, fix Anthropic API Remove Instruct/Chat versions of models & introduce a new ChatTemplate API, fix Anthropic API May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants