怎么设置 kv cache int8 量化，但 a 和 w 仍然是f16，测试 kvcache 量化的收益 #22

seeyourcell · 2023-10-24T11:59:09Z

No description provided.

Alcanderian · 2023-10-27T15:35:30Z

KVcache量化需要使用ppl.pmx导出kvcache量化的模型，目前ppl.nn.llm并不支持没有kvcache量化的模型

参考：
https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/llama/facebook#export

seeyourcell · 2023-10-28T01:38:49Z

KVcache量化需要使用ppl.pmx导出kvcache量化的模型，目前ppl.nn.llm并不支持没有kvcache量化的模型

参考： https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/llama/facebook#export

quant-method: online_i8i8

只是权重和激活量化了？还是权重和激活量化 + kv cache 量化

Alcanderian · 2023-11-06T06:15:52Z

KVcache量化需要使用ppl.pmx导出kvcache量化的模型，目前ppl.nn.llm并不支持没有kvcache量化的模型
参考： https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/llama/facebook#export

quant-method: online_i8i8

只是权重和激活量化了？还是权重和激活量化 + kv cache 量化

这个开关只控制linear层的权重+激活量化，KVcache量化只能在导出模型时进行控制

Ageliss · 2023-12-01T09:45:27Z

哥，online_i8i8跑不通啊

seeyourcell changed the title ~~怎么设置 kv cache int8 量化，但~~ 怎么设置 kv cache int8 量化，但 a 和 w 仍然是f16，测试 kvcache 量化的收益 Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

怎么设置 kv cache int8 量化，但 a 和 w 仍然是f16，测试 kvcache 量化的收益 #22

怎么设置 kv cache int8 量化，但 a 和 w 仍然是f16，测试 kvcache 量化的收益 #22

seeyourcell commented Oct 24, 2023

Alcanderian commented Oct 27, 2023

seeyourcell commented Oct 28, 2023 •

edited

Loading

Alcanderian commented Nov 6, 2023

Ageliss commented Dec 1, 2023

怎么设置 kv cache int8 量化， 但 a 和 w 仍然是f16，测试 kvcache 量化的收益 #22

怎么设置 kv cache int8 量化， 但 a 和 w 仍然是f16，测试 kvcache 量化的收益 #22

Comments

seeyourcell commented Oct 24, 2023

Alcanderian commented Oct 27, 2023

seeyourcell commented Oct 28, 2023 • edited Loading

Alcanderian commented Nov 6, 2023

Ageliss commented Dec 1, 2023

怎么设置 kv cache int8 量化，但 a 和 w 仍然是f16，测试 kvcache 量化的收益 #22

怎么设置 kv cache int8 量化，但 a 和 w 仍然是f16，测试 kvcache 量化的收益 #22

seeyourcell commented Oct 28, 2023 •

edited

Loading