You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
Reproduction
from transformers import AutoTokenizer, AutoModel
import torch
import os
print(f'begin convert to awq models')
os.system(f'lmdeploy lite auto_awq {save_path} --work-dir {save_path}_awq')
Environment
Linux centos
torch2.1.2 cuda12.1 torchvision 0.16.2
transformers 4.40.0
lmdeploy 0.5.3
Error traceback
Using the latest cached version of the dataset since ptb_text_only couldn't be found on the Hugging Face HubFound the latest cached dataset configuration 'penn_treebank' at /root/.cache/huggingface/datasets/ptb_text_only/penn_treebank/1.1.0/fa7dfc4a32462b6a91341205a11ef3ddff7ffc0325ce3cb662e73eddb4ae1182 (last modified on Fri Dec 13 12:57:48 2024).Using the latest cached version of the dataset since ptb_text_only couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'penn_treebank' at /root/.cache/huggingface/datasets/ptb_text_only/penn_treebank/1.1.0/fa7dfc4a32462b6a91341205a11ef3ddff7ffc0325ce3cb662e73eddb4ae1182 (last modified on Fri Dec 13 12:57:48 2024).
Token indices sequence length is longer than the specified maximum sequence length forthis model (1085165 > 4096). Running this sequence through the model will resultin indexing errors
model.layers.0, samples: 128, max gpu memory: 7.14 GB
model.layers.1, samples: 128, max gpu memory: 9.14 GB
model.layers.2, samples: 128, max gpu memory: 9.14 GB
model.layers.3, samples: 128, max gpu memory: 9.14 GB
model.layers.4, samples: 128, max gpu memory: 9.14 GB
model.layers.5, samples: 128, max gpu memory: 9.14 GB
model.layers.6, samples: 128, max gpu memory: 9.14 GB
model.layers.7, samples: 128, max gpu memory: 9.14 GB
model.layers.8, samples: 128, max gpu memory: 9.14 GB
model.layers.9, samples: 128, max gpu memory: 9.14 GB
model.layers.10, samples: 128, max gpu memory: 9.14 GB
model.layers.11, samples: 128, max gpu memory: 9.14 GB
model.layers.12, samples: 128, max gpu memory: 9.14 GB
model.layers.13, samples: 128, max gpu memory: 9.14 GB
model.layers.14, samples: 128, max gpu memory: 9.14 GB
model.layers.15, samples: 128, max gpu memory: 9.14 GB
model.layers.16, samples: 128, max gpu memory: 9.14 GB
model.layers.17, samples: 128, max gpu memory: 9.14 GB
model.layers.18, samples: 128, max gpu memory: 9.14 GB
model.layers.19, samples: 128, max gpu memory: 9.14 GB
model.layers.20, samples: 128, max gpu memory: 9.14 GB
model.layers.21, samples: 128, max gpu memory: 9.14 GB
model.layers.22, samples: 128, max gpu memory: 9.14 GB
model.layers.23, samples: 128, max gpu memory: 9.14 GB
model.layers.24, samples: 128, max gpu memory: 9.14 GB
model.layers.25, samples: 128, max gpu memory: 9.14 GB
model.layers.26, samples: 128, max gpu memory: 9.14 GB
model.layers.27, samples: 128, max gpu memory: 9.14 GB
model.layers.28, samples: 128, max gpu memory: 9.14 GB
model.layers.29, samples: 128, max gpu memory: 9.14 GB
model.layers.30, samples: 128, max gpu memory: 9.14 GB
model.layers.31, samples: 128, max gpu memory: 9.14 GB
Traceback (most recent call last):
File "/opt/conda/bin/lmdeploy", line 8, in<module>sys.exit(run())
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/entrypoint.py", line 36, in run
args.run(args)
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/lite.py", line 139, in auto_awq
auto_awq(**kwargs)
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/apis/auto_awq.py", line 108, in auto_awq
smooth_layers(layers, fc2fcs, norm2fcs, act_scales, w_group_size,
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/quantization/awq.py", line 277, in smooth_layers
smooth_ln_fcs(ln, fcs, a_scales[a_name], group_size)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/quantization/awq.py", line 134, in smooth_ln_fcs
assert torch.isnan(p).sum() == 0
AssertionError
The text was updated successfully, but these errors were encountered:
In this case, you need to add the value_clamp option. Please to refer to this PR for more details. If you're not in a hurry, you can wait until they merge this PR into the main branch before converting to AWQ.
Checklist
Describe the bug
Reproduction
from transformers import AutoTokenizer, AutoModel
import torch
import os
model_path = "/ossfs/workspace/checkpoint-555"
save_path = "/ossfs/workspace/checkpoint-555_fused"
Load the model with device_map="auto" to handle device placement
model = AutoModel.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
device_map="auto" # Add this parameter
)
Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True
)
Save the model and tokenizer
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)
Convert to AWQ models
print(f'begin convert to awq models')
os.system(f'lmdeploy lite auto_awq {save_path} --work-dir {save_path}_awq')
Environment
Error traceback
The text was updated successfully, but these errors were encountered: