You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to run llama2-7b-32k with kvquant. When config.dynamicrope = False, python llama.py modelname wikitext2 --abits 3 --seqlen 4096 --maxseqlen 4096 --quantizer-path ./quantizers-llama2-7b-32k.pickle --include_sparse --sparsity-threshold 0.99 --first_few_fp16 5 raised
self.outliers[: self.klen] = outlier_vals
RuntimeError: The expanded size of the tensor (4096) must match the existing size (30545) at non-singleton dimension 0. Target sizes: [4096, 42]. Tensor sizes: [30545, 42]
When config.dynamicrope = True, it raised
TypeError: LlamaRotaryEmbedding.forward() takes from 2 to 3 positional arguments but 4 were given
How could I quantize llama2-7b-32k using kvquant?
Thanks.
The text was updated successfully, but these errors were encountered:
What's the link of LLaMA-2-7B-32K? https://huggingface.co/togethercomputer/LLaMA-2-7B-32K or https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct ?
I'm attempting to run llama2-7b-32k with kvquant. When
config.dynamicrope = False
,python llama.py modelname wikitext2 --abits 3 --seqlen 4096 --maxseqlen 4096 --quantizer-path ./quantizers-llama2-7b-32k.pickle --include_sparse --sparsity-threshold 0.99 --first_few_fp16 5
raisedWhen
config.dynamicrope = True
, it raisedHow could I quantize llama2-7b-32k using kvquant?
Thanks.
The text was updated successfully, but these errors were encountered: