Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--use_fp8 doesn't work with llama 3.1 8b #2602

Open
4 tasks
ShuaiShao93 opened this issue Dec 20, 2024 · 0 comments
Open
4 tasks

--use_fp8 doesn't work with llama 3.1 8b #2602

ShuaiShao93 opened this issue Dec 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ShuaiShao93
Copy link

System Info

x86_64, debian 11, L40s GPU

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. clone llama 3.1 8b
  2. Install trtllm 0.15.0
  3. convert checkpoint with
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Meta-Llama-3.1-8B-Instruct --output_dir ./tllm_8b_checkpoint_1gpu_fp8  --dtype float16 --use_fp8

Expected behavior

The ckpt should be converted

actual behavior

Got error

[TensorRT-LLM] TensorRT-LLM version: 0.15.0
0.15.0
[12/20/2024-20:09:51] [TRT-LLM] [W] Implicitly setting LLaMAConfig.tie_word_embeddings = False
5it [00:00, 25.38it/s]
Traceback (most recent call last):
  File "/home/ubuntu/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 555, in <module>
    main()
  File "/home/ubuntu/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 547, in main
    convert_and_save_hf(args)
  File "/home/ubuntu/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 488, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/ubuntu/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 495, in execute
    f(args, rank)
  File "/home/ubuntu/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 472, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 405, in from_hugging_face
    loader.generate_tllm_weights(model)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 408, in generate_tllm_weights
    self.load(tllm_key,
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/model_weights_loader.py", line 296, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/quantization/layers.py", line 1478, in postprocess
    new_amax = max(weight_scaling_factors).reshape(1, ).to(
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'

additional notes

N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant