-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chat template not working for DeepSeek Qwen Distill model #181
Comments
What version are you using? I wonder if you don't have this: It isn't a fix for that, but it does provide a fallback. From
|
It runs .. but generates bad results because of the missing chat template. I don't think it's a great idea to silently run with the plain text if the chat template is not there. The model will generate sort of ok responses that are actually not really that good. |
I would at the very least log a warning.. but we already have so many warnings it would probably be hard to see. Probably I would fail on those cases where the model has a chat template (so it expects on) but there is an error in the parsing or something like that. |
The problem is that there is no workaround until #150 is solved -- the API doesn't allow us to probe for a template or offer a replacement. We could certainly log a warning, but the other option is to just exit. This was added for Qwen2VL as it has an unusable template but we handle it in the calling code. See also #173 |
Maybe we can check for this Another option is to load the tokenizer config and check for the template ourselves. We already have the model path.. so maybe it's easy to add it as a field? |
Yes, those are reasonable suggestions -- I will take a look. |
I've verified that the error persists in my PR to Swift Jinja, so I'll investigate further to find the cause. |
I have now solved this in johnmai-dev/Jinja#8. @pcuenca, can we please merge my PR to Swift Jinja as well as the ones I have open in swift-transformers, considering that I've added a large number of tests from the Python and TypeScript Jinja implementations and they're passing? I've put a huge amount of effort into making Swift Jinja work with chat templates for the latest models, and I've done this all for free. My PRs in mlx-swift-examples, which showcase the new capabilities, also depend on my Swift Jinja and swift-transformers PRs getting merged. Unfortunately I haven't gotten a response from @pcuenca on this yet. |
I missed this at first when I was testing DeepSeek R1 because of the silent error, so it would be great if we could throw an error when the template doesn't work, which I believe is what happened in a previous version. |
I think we can close this, its working now after updating Jinja. Thanks for the fix! |
@davidkoski should I make a separate issue about failling in cases where it doesn't make sense to silently continue without a chat template? |
The problem with detecting what went wrong is the error type is not public:
so we can't actually examine that. Perhaps we need an issue on swift-transformers for this and then we could handle these cases differently. |
If you propose changes in swift-transformers, please encourage @pcuenca to merge huggingface/swift-transformers#158 (or some other formatting solution of his choice) and huggingface/swift-transformers#151 first. We already agreed on formatting rules to minimize merge conflicts, but I'm still waiting on this. |
Trying to run mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-8bit but it's generating bad results because the template is not getting added.
Catching the error:
Maybe a Jinja issue, not sure?
The text was updated successfully, but these errors were encountered: