Chat template not working for DeepSeek Qwen Distill model #181

awni · 2025-01-21T16:17:21Z

Trying to run mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-8bit but it's generating bad results because the template is not getting added.

Catching the error:

parser("Parser Error: Expected closing statement token. openSquareBracket != closeStatement.")

Maybe a Jinja issue, not sure?

The text was updated successfully, but these errors were encountered:

davidkoski · 2025-01-21T16:24:22Z

What version are you using? I wonder if you don't have this:

https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/MLXLLM/LLMModelFactory.swift#L228

It isn't a fix for that, but it does provide a fallback. From HEAD of main I get:

Model loaded -> id("mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-8bit")
Starting generation ...
how do I compute the square footage of a house?  How do I compute the square footage of a rectangle?
To compute the square footage of a house, I need to do the following steps:
1. Measure the length and width of each room in the house.
2. Calculate the area of each room by multiplying the length and width.
3. Sum all the areas of the rooms to get the total square footage of the house.

For the square footage of a rectangle, the steps are:
1. Measure the length and width of the rectangle.
2------
Prompt:     12 tokens, 147.637808 tokens/s
Generation: 100 tokens, 117.780507 tokens/s, 0.849037s
Program ended with exit code: 0

awni · 2025-01-21T16:25:58Z

It runs .. but generates bad results because of the missing chat template. I don't think it's a great idea to silently run with the plain text if the chat template is not there. The model will generate sort of ok responses that are actually not really that good.

awni · 2025-01-21T16:27:16Z

I would at the very least log a warning.. but we already have so many warnings it would probably be hard to see. Probably I would fail on those cases where the model has a chat template (so it expects on) but there is an error in the parsing or something like that.

davidkoski · 2025-01-21T16:29:30Z

The problem is that there is no workaround until #150 is solved -- the API doesn't allow us to probe for a template or offer a replacement. We could certainly log a warning, but the other option is to just exit.

This was added for Qwen2VL as it has an unusable template but we handle it in the calling code. See also #173

awni · 2025-01-21T16:46:53Z

Maybe we can check for this TokenizerError.chatTemplate error? Idk if that's easy but it does look like that error only comes up if the tokenizer doesn't have a chat template.

Another option is to load the tokenizer config and check for the template ourselves. We already have the model path.. so maybe it's easy to add it as a field?

davidkoski · 2025-01-21T16:55:32Z

Yes, those are reasonable suggestions -- I will take a look.

DePasqualeOrg · 2025-01-21T21:15:07Z

I've verified that the error persists in my PR to Swift Jinja, so I'll investigate further to find the cause.

johnmai-dev/Jinja#8

DePasqualeOrg · 2025-01-21T22:28:44Z

I have now solved this in johnmai-dev/Jinja#8.

@pcuenca, can we please merge my PR to Swift Jinja as well as the ones I have open in swift-transformers, considering that I've added a large number of tests from the Python and TypeScript Jinja implementations and they're passing? I've put a huge amount of effort into making Swift Jinja work with chat templates for the latest models, and I've done this all for free.

My PRs in mlx-swift-examples, which showcase the new capabilities, also depend on my Swift Jinja and swift-transformers PRs getting merged. Unfortunately I haven't gotten a response from @pcuenca on this yet.

DePasqualeOrg · 2025-01-22T08:33:30Z

I missed this at first when I was testing DeepSeek R1 because of the silent error, so it would be great if we could throw an error when the template doesn't work, which I believe is what happened in a previous version.

awni · 2025-01-22T15:08:57Z

I think we can close this, its working now after updating Jinja. Thanks for the fix!

awni · 2025-01-22T15:09:28Z

@davidkoski should I make a separate issue about failling in cases where it doesn't make sense to silently continue without a chat template?

davidkoski · 2025-01-22T16:11:19Z

The problem with detecting what went wrong is the error type is not public:

enum TokenizerError: Error {
    case missingConfig
    case missingTokenizerClassInConfig
    case unsupportedTokenizer(String)
    case missingVocab
    case malformedVocab
    case chatTemplate(String)
    case tooLong(String)
}

so we can't actually examine that. Perhaps we need an issue on swift-transformers for this and then we could handle these cases differently.

DePasqualeOrg · 2025-01-22T22:13:24Z

If you propose changes in swift-transformers, please encourage @pcuenca to merge huggingface/swift-transformers#158 (or some other formatting solution of his choice) and huggingface/swift-transformers#151 first. We already agreed on formatting rules to minimize merge conflicts, but I'm still waiting on this.

awni mentioned this issue Jan 21, 2025

Parse error on DeepSeek R1 chat template johnmai-dev/Jinja#12

Closed

davidkoski self-assigned this Jan 21, 2025

awni closed this as completed Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat template not working for DeepSeek Qwen Distill model #181

Chat template not working for DeepSeek Qwen Distill model #181

awni commented Jan 21, 2025

davidkoski commented Jan 21, 2025

awni commented Jan 21, 2025

awni commented Jan 21, 2025

davidkoski commented Jan 21, 2025

awni commented Jan 21, 2025

davidkoski commented Jan 21, 2025

DePasqualeOrg commented Jan 21, 2025 •

edited

Loading

DePasqualeOrg commented Jan 21, 2025

DePasqualeOrg commented Jan 22, 2025

awni commented Jan 22, 2025

awni commented Jan 22, 2025

davidkoski commented Jan 22, 2025

DePasqualeOrg commented Jan 22, 2025

Chat template not working for DeepSeek Qwen Distill model #181

Chat template not working for DeepSeek Qwen Distill model #181

Comments

awni commented Jan 21, 2025

davidkoski commented Jan 21, 2025

awni commented Jan 21, 2025

awni commented Jan 21, 2025

davidkoski commented Jan 21, 2025

awni commented Jan 21, 2025

davidkoski commented Jan 21, 2025

DePasqualeOrg commented Jan 21, 2025 • edited Loading

DePasqualeOrg commented Jan 21, 2025

DePasqualeOrg commented Jan 22, 2025

awni commented Jan 22, 2025

awni commented Jan 22, 2025

davidkoski commented Jan 22, 2025

DePasqualeOrg commented Jan 22, 2025

DePasqualeOrg commented Jan 21, 2025 •

edited

Loading