[Bug] facing nan problem assert torch.isnan(p).sum() == 0

### Checklist

- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

### Describe the bug

![8BD3EB78-1AC0-4977-90ED-B65C90F35202](https://github.com/user-attachments/assets/1b9e8162-5d58-438c-a49c-321565405b1d)


### Reproduction

from transformers import AutoTokenizer, AutoModel
import torch
import os

model_path = "/ossfs/workspace/checkpoint-555"
save_path = "/ossfs/workspace/checkpoint-555_fused"

# Load the model with device_map="auto" to handle device placement
model = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map="auto"  # Add this parameter
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

# Save the model and tokenizer
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

# Convert to AWQ models
print(f'begin convert to awq models')
os.system(f'lmdeploy lite auto_awq {save_path} --work-dir {save_path}_awq')

### Environment

```Shell
Linux centos
torch2.1.2 cuda12.1 torchvision 0.16.2
transformers 4.40.0
lmdeploy 0.5.3
```


### Error traceback

```Shell
Using the latest cached version of the dataset since ptb_text_only couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'penn_treebank' at /root/.cache/huggingface/datasets/ptb_text_only/penn_treebank/1.1.0/fa7dfc4a32462b6a91341205a11ef3ddff7ffc0325ce3cb662e73eddb4ae1182 (last modified on Fri Dec 13 12:57:48 2024).
Using the latest cached version of the dataset since ptb_text_only couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'penn_treebank' at /root/.cache/huggingface/datasets/ptb_text_only/penn_treebank/1.1.0/fa7dfc4a32462b6a91341205a11ef3ddff7ffc0325ce3cb662e73eddb4ae1182 (last modified on Fri Dec 13 12:57:48 2024).
Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 4096). Running this sequence through the model will result in indexing errors
model.layers.0, samples: 128, max gpu memory: 7.14 GB
model.layers.1, samples: 128, max gpu memory: 9.14 GB
model.layers.2, samples: 128, max gpu memory: 9.14 GB
model.layers.3, samples: 128, max gpu memory: 9.14 GB
model.layers.4, samples: 128, max gpu memory: 9.14 GB
model.layers.5, samples: 128, max gpu memory: 9.14 GB
model.layers.6, samples: 128, max gpu memory: 9.14 GB
model.layers.7, samples: 128, max gpu memory: 9.14 GB
model.layers.8, samples: 128, max gpu memory: 9.14 GB
model.layers.9, samples: 128, max gpu memory: 9.14 GB
model.layers.10, samples: 128, max gpu memory: 9.14 GB
model.layers.11, samples: 128, max gpu memory: 9.14 GB
model.layers.12, samples: 128, max gpu memory: 9.14 GB
model.layers.13, samples: 128, max gpu memory: 9.14 GB
model.layers.14, samples: 128, max gpu memory: 9.14 GB
model.layers.15, samples: 128, max gpu memory: 9.14 GB
model.layers.16, samples: 128, max gpu memory: 9.14 GB
model.layers.17, samples: 128, max gpu memory: 9.14 GB
model.layers.18, samples: 128, max gpu memory: 9.14 GB
model.layers.19, samples: 128, max gpu memory: 9.14 GB
model.layers.20, samples: 128, max gpu memory: 9.14 GB
model.layers.21, samples: 128, max gpu memory: 9.14 GB
model.layers.22, samples: 128, max gpu memory: 9.14 GB
model.layers.23, samples: 128, max gpu memory: 9.14 GB
model.layers.24, samples: 128, max gpu memory: 9.14 GB
model.layers.25, samples: 128, max gpu memory: 9.14 GB
model.layers.26, samples: 128, max gpu memory: 9.14 GB
model.layers.27, samples: 128, max gpu memory: 9.14 GB
model.layers.28, samples: 128, max gpu memory: 9.14 GB
model.layers.29, samples: 128, max gpu memory: 9.14 GB
model.layers.30, samples: 128, max gpu memory: 9.14 GB
model.layers.31, samples: 128, max gpu memory: 9.14 GB
Traceback (most recent call last):
  File "/opt/conda/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/entrypoint.py", line 36, in run
    args.run(args)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/lite.py", line 139, in auto_awq
    auto_awq(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/apis/auto_awq.py", line 108, in auto_awq
    smooth_layers(layers, fc2fcs, norm2fcs, act_scales, w_group_size,
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/quantization/awq.py", line 277, in smooth_layers
    smooth_ln_fcs(ln, fcs, a_scales[a_name], group_size)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/lite/quantization/awq.py", line 134, in smooth_ln_fcs
    assert torch.isnan(p).sum() == 0
AssertionError
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] facing nan problem assert torch.isnan(p).sum() == 0 #757

Checklist

Describe the bug

Reproduction

Load the model with device_map="auto" to handle device placement

Load tokenizer

Save the model and tokenizer

Convert to AWQ models

Environment

Error traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] facing nan problem assert torch.isnan(p).sum() == 0 #757

Description

Checklist

Describe the bug

Reproduction

Load the model with device_map="auto" to handle device placement

Load tokenizer

Save the model and tokenizer

Convert to AWQ models

Environment

Error traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions