Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] facing nan problem assert torch.isnan(p).sum() == 0 #757

Open
3 tasks done
lzk9508 opened this issue Dec 13, 2024 · 1 comment
Open
3 tasks done

[Bug] facing nan problem assert torch.isnan(p).sum() == 0 #757

lzk9508 opened this issue Dec 13, 2024 · 1 comment

Comments

@lzk9508
Copy link

lzk9508 commented Dec 13, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

8BD3EB78-1AC0-4977-90ED-B65C90F35202

Reproduction

from transformers import AutoTokenizer, AutoModel
import torch
import os

model_path = "/ossfs/workspace/checkpoint-555"
save_path = "/ossfs/workspace/checkpoint-555_fused"

Load the model with device_map="auto" to handle device placement

model = AutoModel.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
device_map="auto" # Add this parameter
)

Load tokenizer

tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True
)

Save the model and tokenizer

model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

Convert to AWQ models

print(f'begin convert to awq models')
os.system(f'lmdeploy lite auto_awq {save_path} --work-dir {save_path}_awq')

Environment

Linux centos
torch2.1.2 cuda12.1 torchvision 0.16.2
transformers 4.40.0
lmdeploy 0.5.3

Error traceback

Using the latest cached version of the dataset since ptb_text_only couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'penn_treebank' at /root/.cache/huggingface/datasets/ptb_text_only/penn_treebank/1.1.0/fa7dfc4a32462b6a91341205a11ef3ddff7ffc0325ce3cb662e73eddb4ae1182 (last modified on Fri Dec 13 12:57:48 2024).
Using the latest cached version of the dataset since ptb_text_only couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'penn_treebank' at /root/.cache/huggingface/datasets/ptb_text_only/penn_treebank/1.1.0/fa7dfc4a32462b6a91341205a11ef3ddff7ffc0325ce3cb662e73eddb4ae1182 (last modified on Fri Dec 13 12:57:48 2024).
Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 4096). Running this sequence through the model will result in indexing errors
model.layers.0, samples: 128, max gpu memory: 7.14 GB
model.layers.1, samples: 128, max gpu memory: 9.14 GB
model.layers.2, samples: 128, max gpu memory: 9.14 GB
model.layers.3, samples: 128, max gpu memory: 9.14 GB
model.layers.4, samples: 128, max gpu memory: 9.14 GB
model.layers.5, samples: 128, max gpu memory: 9.14 GB
model.layers.6, samples: 128, max gpu memory: 9.14 GB
model.layers.7, samples: 128, max gpu memory: 9.14 GB
model.layers.8, samples: 128, max gpu memory: 9.14 GB
model.layers.9, samples: 128, max gpu memory: 9.14 GB
model.layers.10, samples: 128, max gpu memory: 9.14 GB
model.layers.11, samples: 128, max gpu memory: 9.14 GB
model.layers.12, samples: 128, max gpu memory: 9.14 GB
model.layers.13, samples: 128, max gpu memory: 9.14 GB
model.layers.14, samples: 128, max gpu memory: 9.14 GB
model.layers.15, samples: 128, max gpu memory: 9.14 GB
model.layers.16, samples: 128, max gpu memory: 9.14 GB
model.layers.17, samples: 128, max gpu memory: 9.14 GB
model.layers.18, samples: 128, max gpu memory: 9.14 GB
model.layers.19, samples: 128, max gpu memory: 9.14 GB
model.layers.20, samples: 128, max gpu memory: 9.14 GB
model.layers.21, samples: 128, max gpu memory: 9.14 GB
model.layers.22, samples: 128, max gpu memory: 9.14 GB
model.layers.23, samples: 128, max gpu memory: 9.14 GB
model.layers.24, samples: 128, max gpu memory: 9.14 GB
model.layers.25, samples: 128, max gpu memory: 9.14 GB
model.layers.26, samples: 128, max gpu memory: 9.14 GB
model.layers.27, samples: 128, max gpu memory: 9.14 GB
model.layers.28, samples: 128, max gpu memory: 9.14 GB
model.layers.29, samples: 128, max gpu memory: 9.14 GB
model.layers.30, samples: 128, max gpu memory: 9.14 GB
model.layers.31, samples: 128, max gpu memory: 9.14 GB
Traceback (most recent call last):
  File "/opt/conda/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/entrypoint.py", line 36, in run
    args.run(args)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/lite.py", line 139, in auto_awq
    auto_awq(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/apis/auto_awq.py", line 108, in auto_awq
    smooth_layers(layers, fc2fcs, norm2fcs, act_scales, w_group_size,
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/quantization/awq.py", line 277, in smooth_layers
    smooth_ln_fcs(ln, fcs, a_scales[a_name], group_size)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/quantization/awq.py", line 134, in smooth_ln_fcs
    assert torch.isnan(p).sum() == 0
AssertionError
@Weiyun1025
Copy link
Collaborator

In this case, you need to add the value_clamp option. Please to refer to this PR for more details. If you're not in a hurry, you can wait until they merge this PR into the main branch before converting to AWQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants