Append mult-eos,half-rope,bos to GLM4-0414 and Z #13021

piDack · 2025-04-19T12:47:24Z

The error fixes base on #12867. Compared to the #12957 method, it is more merge-friendly.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Tianyue-Zhao · 2025-04-20T05:35:40Z

Thanks for reverting the breaking change with GLM-4.
GLM-4 may be outdated now, but so far GLM-4V-9B and CogAgent-9B based on it have not been refreshed in the new batch of models released by THUDM. (Though with the nature of how these models get developed, one might suspect a refresh isn't far away?)
Anyways, I wanted to mention this because I've been working on bringing GLM-4V-9B support to llama.cpp.
It's relatively easy because GLM-4V-9B is basically just GLM-4-9B + a vision encoder that's almost the same as GLM-Edge, both of which are already in llama.cpp.

mrdevolver · 2025-04-23T03:48:17Z

Let's get this show on the road, please. 😀👍

jxy · 2025-04-24T02:31:47Z

Using Metal, it doesn't work with -fa, only generating infinite sequence of @.

Chat template is also wrong, with

example_format: '<|system|>
You are a helpful assistant<|user|>
Hello<|assistant|>
Hi there<|user|>
How are you?<|assistant|>'

Adding --jinja fixed the chat template to be

example_format: '<sop><|system|>
You are a helpful assistant<|user|>
Hello<|assistant|>
Hi there<|user|>
How are you?<|assistant|>'

ngxson · 2025-04-24T07:34:34Z

I don't know why GLM4 always add that <sop> token, but I think it should be marked as BOS with self.gguf_writer.add_add_bos_token(True)

arch-btw · 2025-04-24T09:21:41Z

@ngxson I wonder if this is because of [gMASK] also being a BOS token? That's what it currently is with and without jinja:

print_info: BOS token = 151331 '[gMASK]'

I guess maybe it's not possible to have 2 tokens there, not sure 🤔

Here's the template from tokenizer_config.json, both tokens do seem to be defined in there:

~~hmm strange, when I paste the template github removes the sop tag~~

edit 2:

"chat_template": "[gMASK]<sop>{%- if tools -%}<|system|>\n# 可用工具\n{% for tool in tools %}{%- set function = tool.function if tool.get(\"function\") else tool %}\n\n## {{ function.name }}\n\n{{ function | tojson(indent=4, ensure_ascii=False) }}\n在调用上述函数时，请使用 Json 格式表示调用的参数。{%- endfor %}{%- endif -%}{%- for msg in messages %}{%- if msg.role == 'system' %}<|system|>\n{{ msg.content }}{%- endif %}{%- endfor %}{%- for message in messages if message.role != 'system' %}{%- set role = message['role'] %}{%- set content = message['content'] %}{%- set meta = message.get(\"metadata\", \"\") %}{%- if role == 'user' %}<|user|>\n{{ content }}{%- elif role == 'assistant' and not meta %}<|assistant|>\n{{ content }}{%- elif role == 'assistant' and meta %}<|assistant|>{{ meta }} \n{{ content }}{%- elif role == 'observation' %}<|observation|>\n{{ content }}{%- endif %}{%- endfor %}{% if add_generation_prompt %}<|assistant|>{% endif %}",

ngxson · 2025-04-24T09:35:02Z

Btw it would be better if THUDM removes the [gMASK] and <sop> tokens on their next model. Newer models nowadays does not even need a BOS token, the special tokens <|system|>, <|user|>, etc already does the job.

Edit: they don't even include EOT token which I suggested in THUDM/GLM-Edge#2 , you we should expect performance degrade even with a correct tokenizer

matteoserva · 2025-04-24T13:57:21Z

With --jinja --override-kv tokenizer.ggml.add_bos_token=bool:true I get the correct template.
From my limited tests the model performed worse without the bos tokens

matteoserva · 2025-04-24T15:37:54Z

Maybe I found the bug. The server was detecting the wrong template, LLM_CHAT_TEMPLATE_GLMEDGE instead of LLM_CHAT_TEMPLATE_CHATGML_4.

you can review my pull request here: #13099

This should fix the [gMASK] issue

…13021) * append mult-eos,half-rope,bos to GLM4-0414 * remove unset var

append mult-eos,half-rope,bos to GLM4-0414

a631456

github-actions bot added the python python script changes label Apr 19, 2025

remove unset var

4fdc52f

piDack mentioned this pull request Apr 19, 2025

Resolved half rope,multi-EOS issues in convert_hf_togguf.py for GLM4Z Model #12957

Closed

ikawrakow mentioned this pull request Apr 20, 2025

Support GLM-4-0414 models based on piDack's mainline PR ikawrakow/ik_llama.cpp#333

Closed

ggerganov approved these changes Apr 22, 2025

View reviewed changes

ggerganov requested a review from ngxson April 22, 2025 06:06

ngxson approved these changes Apr 23, 2025

View reviewed changes

ngxson merged commit eb1776b into ggml-org:master Apr 23, 2025
5 checks passed

despairTK mentioned this pull request Apr 23, 2025

GLM Z1 AI Models broken (your own quants and others) lmstudio-ai/lmstudio-bug-tracker#590

Open

matteoserva mentioned this pull request Apr 25, 2025

fix wrong template in GLM4-0414 #13099

Closed

ubergarm mentioned this pull request Apr 25, 2025

Add GLM-4-0414 Model Support ikawrakow/ik_llama.cpp#344

Merged

city96 mentioned this pull request Apr 26, 2025

Eval bug: GLM-Z1-9B-0414 #12946

Closed

pockers21 pushed a commit to pockers21/llama.cpp that referenced this pull request Apr 28, 2025

convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z (ggml-org#…

5978039

…13021) * append mult-eos,half-rope,bos to GLM4-0414 * remove unset var

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Append mult-eos,half-rope,bos to GLM4-0414 and Z #13021

Append mult-eos,half-rope,bos to GLM4-0414 and Z #13021

Uh oh!

piDack commented Apr 19, 2025 •

edited

Loading

Uh oh!

Tianyue-Zhao commented Apr 20, 2025

Uh oh!

mrdevolver commented Apr 23, 2025

Uh oh!

Uh oh!

jxy commented Apr 24, 2025

Uh oh!

ngxson commented Apr 24, 2025

Uh oh!

arch-btw commented Apr 24, 2025 •

edited

Loading

Uh oh!

ngxson commented Apr 24, 2025 •

edited

Loading

Uh oh!

matteoserva commented Apr 24, 2025

Uh oh!

matteoserva commented Apr 24, 2025

Uh oh!

Uh oh!

Append mult-eos,half-rope,bos to GLM4-0414 and Z #13021

Append mult-eos,half-rope,bos to GLM4-0414 and Z #13021

Uh oh!

Conversation

piDack commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tianyue-Zhao commented Apr 20, 2025

Uh oh!

mrdevolver commented Apr 23, 2025

Uh oh!

Uh oh!

jxy commented Apr 24, 2025

Uh oh!

ngxson commented Apr 24, 2025

Uh oh!

arch-btw commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matteoserva commented Apr 24, 2025

Uh oh!

matteoserva commented Apr 24, 2025

Uh oh!

Uh oh!

piDack commented Apr 19, 2025 •

edited

Loading

arch-btw commented Apr 24, 2025 •

edited

Loading

ngxson commented Apr 24, 2025 •

edited

Loading