Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

怎么设置 kv cache int8 量化, 但 a 和 w 仍然是f16,测试 kvcache 量化的收益 #22

Open
seeyourcell opened this issue Oct 24, 2023 · 4 comments

Comments

@seeyourcell
Copy link

No description provided.

@seeyourcell seeyourcell changed the title 怎么设置 kv cache int8 量化, 但 怎么设置 kv cache int8 量化, 但 a 和 w 仍然是f16,测试 kvcache 量化的收益 Oct 24, 2023
@Alcanderian
Copy link
Collaborator

KVcache量化需要使用ppl.pmx导出kvcache量化的模型,目前ppl.nn.llm并不支持没有kvcache量化的模型

参考:
https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/llama/facebook#export

@seeyourcell
Copy link
Author

seeyourcell commented Oct 28, 2023

KVcache量化需要使用ppl.pmx导出kvcache量化的模型,目前ppl.nn.llm并不支持没有kvcache量化的模型

参考: https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/llama/facebook#export

quant-method: online_i8i8

只是权重和激活量化了? 还是 权重和激活量化 + kv cache 量化

@Alcanderian
Copy link
Collaborator

KVcache量化需要使用ppl.pmx导出kvcache量化的模型,目前ppl.nn.llm并不支持没有kvcache量化的模型
参考: https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/llama/facebook#export

quant-method: online_i8i8

只是权重和激活量化了? 还是 权重和激活量化 + kv cache 量化

这个开关只控制linear层的权重+激活量化,KVcache量化只能在导出模型时进行控制

@Ageliss
Copy link

Ageliss commented Dec 1, 2023

哥,online_i8i8跑不通啊
image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants