Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat template not working for DeepSeek Qwen Distill model #181

Closed
awni opened this issue Jan 21, 2025 · 13 comments
Closed

Chat template not working for DeepSeek Qwen Distill model #181

awni opened this issue Jan 21, 2025 · 13 comments
Assignees

Comments

@awni
Copy link
Member

awni commented Jan 21, 2025

Trying to run mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-8bit but it's generating bad results because the template is not getting added.

Catching the error:

parser("Parser Error: Expected closing statement token. openSquareBracket != closeStatement.")

Maybe a Jinja issue, not sure?

@davidkoski
Copy link
Collaborator

What version are you using? I wonder if you don't have this:

It isn't a fix for that, but it does provide a fallback. From HEAD of main I get:

Model loaded -> id("mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-8bit")
Starting generation ...
how do I compute the square footage of a house?  How do I compute the square footage of a rectangle?
To compute the square footage of a house, I need to do the following steps:
1. Measure the length and width of each room in the house.
2. Calculate the area of each room by multiplying the length and width.
3. Sum all the areas of the rooms to get the total square footage of the house.

For the square footage of a rectangle, the steps are:
1. Measure the length and width of the rectangle.
2------
Prompt:     12 tokens, 147.637808 tokens/s
Generation: 100 tokens, 117.780507 tokens/s, 0.849037s
Program ended with exit code: 0

@awni
Copy link
Member Author

awni commented Jan 21, 2025

It runs .. but generates bad results because of the missing chat template. I don't think it's a great idea to silently run with the plain text if the chat template is not there. The model will generate sort of ok responses that are actually not really that good.

@awni
Copy link
Member Author

awni commented Jan 21, 2025

I would at the very least log a warning.. but we already have so many warnings it would probably be hard to see. Probably I would fail on those cases where the model has a chat template (so it expects on) but there is an error in the parsing or something like that.

@davidkoski
Copy link
Collaborator

The problem is that there is no workaround until #150 is solved -- the API doesn't allow us to probe for a template or offer a replacement. We could certainly log a warning, but the other option is to just exit.

This was added for Qwen2VL as it has an unusable template but we handle it in the calling code. See also #173

@awni
Copy link
Member Author

awni commented Jan 21, 2025

Maybe we can check for this TokenizerError.chatTemplate error? Idk if that's easy but it does look like that error only comes up if the tokenizer doesn't have a chat template.

Another option is to load the tokenizer config and check for the template ourselves. We already have the model path.. so maybe it's easy to add it as a field?

@davidkoski
Copy link
Collaborator

Yes, those are reasonable suggestions -- I will take a look.

@davidkoski davidkoski self-assigned this Jan 21, 2025
@DePasqualeOrg
Copy link
Contributor

DePasqualeOrg commented Jan 21, 2025

I've verified that the error persists in my PR to Swift Jinja, so I'll investigate further to find the cause.

johnmai-dev/Jinja#8

@DePasqualeOrg
Copy link
Contributor

I have now solved this in johnmai-dev/Jinja#8.

@pcuenca, can we please merge my PR to Swift Jinja as well as the ones I have open in swift-transformers, considering that I've added a large number of tests from the Python and TypeScript Jinja implementations and they're passing? I've put a huge amount of effort into making Swift Jinja work with chat templates for the latest models, and I've done this all for free.

My PRs in mlx-swift-examples, which showcase the new capabilities, also depend on my Swift Jinja and swift-transformers PRs getting merged. Unfortunately I haven't gotten a response from @pcuenca on this yet.

@DePasqualeOrg
Copy link
Contributor

I missed this at first when I was testing DeepSeek R1 because of the silent error, so it would be great if we could throw an error when the template doesn't work, which I believe is what happened in a previous version.

@awni
Copy link
Member Author

awni commented Jan 22, 2025

I think we can close this, its working now after updating Jinja. Thanks for the fix!

@awni awni closed this as completed Jan 22, 2025
@awni
Copy link
Member Author

awni commented Jan 22, 2025

@davidkoski should I make a separate issue about failling in cases where it doesn't make sense to silently continue without a chat template?

@davidkoski
Copy link
Collaborator

The problem with detecting what went wrong is the error type is not public:

enum TokenizerError: Error {
    case missingConfig
    case missingTokenizerClassInConfig
    case unsupportedTokenizer(String)
    case missingVocab
    case malformedVocab
    case chatTemplate(String)
    case tooLong(String)
}

so we can't actually examine that. Perhaps we need an issue on swift-transformers for this and then we could handle these cases differently.

@DePasqualeOrg
Copy link
Contributor

If you propose changes in swift-transformers, please encourage @pcuenca to merge huggingface/swift-transformers#158 (or some other formatting solution of his choice) and huggingface/swift-transformers#151 first. We already agreed on formatting rules to minimize merge conflicts, but I'm still waiting on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants