Skip to content

Append mult-eos,half-rope,bos to GLM4-0414 and Z #13021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 23, 2025

Conversation

piDack
Copy link
Contributor

@piDack piDack commented Apr 19, 2025

The error fixes base on #12867. Compared to the #12957 method, it is more merge-friendly.

@github-actions github-actions bot added the python python script changes label Apr 19, 2025
@Tianyue-Zhao
Copy link

Thanks for reverting the breaking change with GLM-4.
GLM-4 may be outdated now, but so far GLM-4V-9B and CogAgent-9B based on it have not been refreshed in the new batch of models released by THUDM. (Though with the nature of how these models get developed, one might suspect a refresh isn't far away?)
Anyways, I wanted to mention this because I've been working on bringing GLM-4V-9B support to llama.cpp.
It's relatively easy because GLM-4V-9B is basically just GLM-4-9B + a vision encoder that's almost the same as GLM-Edge, both of which are already in llama.cpp.

@ggerganov ggerganov requested a review from ngxson April 22, 2025 06:06
@mrdevolver
Copy link

Let's get this show on the road, please. 😀👍

@ngxson ngxson merged commit eb1776b into ggml-org:master Apr 23, 2025
5 checks passed
@jxy
Copy link
Contributor

jxy commented Apr 24, 2025

Using Metal, it doesn't work with -fa, only generating infinite sequence of @.

Chat template is also wrong, with

example_format: '<|system|>
You are a helpful assistant<|user|>
Hello<|assistant|>
Hi there<|user|>
How are you?<|assistant|>'

Adding --jinja fixed the chat template to be

example_format: '<sop><|system|>
You are a helpful assistant<|user|>
Hello<|assistant|>
Hi there<|user|>
How are you?<|assistant|>'

@ngxson
Copy link
Collaborator

ngxson commented Apr 24, 2025

I don't know why GLM4 always add that <sop> token, but I think it should be marked as BOS with self.gguf_writer.add_add_bos_token(True)

@arch-btw
Copy link
Contributor

arch-btw commented Apr 24, 2025

@ngxson I wonder if this is because of [gMASK] also being a BOS token? That's what it currently is with and without jinja:

print_info: BOS token = 151331 '[gMASK]'

I guess maybe it's not possible to have 2 tokens there, not sure 🤔

Here's the template from tokenizer_config.json, both tokens do seem to be defined in there:

hmm strange, when I paste the template github removes the sop tag

edit 2:

"chat_template": "[gMASK]<sop>{%- if tools -%}<|system|>\n# 可用工具\n{% for tool in tools %}{%- set function = tool.function if tool.get(\"function\") else tool %}\n\n## {{ function.name }}\n\n{{ function | tojson(indent=4, ensure_ascii=False) }}\n在调用上述函数时,请使用 Json 格式表示调用的参数。{%- endfor %}{%- endif -%}{%- for msg in messages %}{%- if msg.role == 'system' %}<|system|>\n{{ msg.content }}{%- endif %}{%- endfor %}{%- for message in messages if message.role != 'system' %}{%- set role = message['role'] %}{%- set content = message['content'] %}{%- set meta = message.get(\"metadata\", \"\") %}{%- if role == 'user' %}<|user|>\n{{ content }}{%- elif role == 'assistant' and not meta %}<|assistant|>\n{{ content }}{%- elif role == 'assistant' and meta %}<|assistant|>{{ meta }} \n{{ content }}{%- elif role == 'observation' %}<|observation|>\n{{ content }}{%- endif %}{%- endfor %}{% if add_generation_prompt %}<|assistant|>{% endif %}",

@ngxson
Copy link
Collaborator

ngxson commented Apr 24, 2025

Btw it would be better if THUDM removes the [gMASK] and <sop> tokens on their next model. Newer models nowadays does not even need a BOS token, the special tokens <|system|>, <|user|>, etc already does the job.

Edit: they don't even include EOT token which I suggested in THUDM/GLM-Edge#2 , you we should expect performance degrade even with a correct tokenizer

@matteoserva
Copy link
Contributor

With --jinja --override-kv tokenizer.ggml.add_bos_token=bool:true I get the correct template.
From my limited tests the model performed worse without the bos tokens

@matteoserva
Copy link
Contributor

Maybe I found the bug. The server was detecting the wrong template, LLM_CHAT_TEMPLATE_GLMEDGE instead of LLM_CHAT_TEMPLATE_CHATGML_4.

you can review my pull request here: #13099

This should fix the [gMASK] issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants