-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Append mult-eos,half-rope,bos to GLM4-0414 and Z #13021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for reverting the breaking change with GLM-4. |
Let's get this show on the road, please. 😀👍 |
Using Metal, it doesn't work with Chat template is also wrong, with
Adding
|
I don't know why GLM4 always add that |
@ngxson I wonder if this is because of [gMASK] also being a BOS token? That's what it currently is with and without jinja:
I guess maybe it's not possible to have 2 tokens there, not sure 🤔 Here's the template from tokenizer_config.json, both tokens do seem to be defined in there:
edit 2:
|
Btw it would be better if THUDM removes the Edit: they don't even include EOT token which I suggested in THUDM/GLM-Edge#2 , you we should expect performance degrade even with a correct tokenizer |
With |
Maybe I found the bug. The server was detecting the wrong template, LLM_CHAT_TEMPLATE_GLMEDGE instead of LLM_CHAT_TEMPLATE_CHATGML_4. you can review my pull request here: #13099 This should fix the [gMASK] issue |
The error fixes base on #12867. Compared to the #12957 method, it is more merge-friendly.