Error while performing Inference with llama-3.2 3B checkpoint #1749

Vattikondadheeraj · 2024-10-03T05:28:51Z

Hey,

I am trying to perform inference with llama-3.2 3B Instruct checkpoint but I am facing some errors. I didn't go deep into the code or try to debug yet. I am posting here in a hope to get quick resolution. I have attached the error below

DEBUG:torchtune.utils._logging:Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
Traceback (most recent call last):
  File "/home/toolkit/.conda/envs/torch/bin/tune", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/run.py", line 187, in _run_cmd
    self._run_single_device(args)
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/run.py", line 96, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/recipes/generate.py", line 211, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/config/_parse.py", line 99, in wrapper
    sys.exit(recipe_main(conf))
             ^^^^^^^^^^^^^^^^^
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/recipes/generate.py", line 206, in main
    recipe.setup(cfg=cfg)
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/recipes/generate.py", line 55, in setup
    self._model = self._setup_model(
                  ^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/recipes/generate.py", line 73, in _setup_model
    model.load_state_dict(model_state_dict)
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for TransformerDecoder:
        Unexpected key(s) in state_dict: "output.weight".

Heres in my config file

# Config for running the InferenceRecipe in generate.py to generate output from an LLM
#
# To launch, run the following command from root torchtune directory:
#    tune run generate --config generation

# Model arguments
model:
  _component_: torchtune.models.llama3_2.llama3_2_3b

checkpointer:
  _component_: torchtune.training.FullModelMetaCheckpointer
  checkpoint_dir: /home/toolkit/scratch/LLMcode/Checkpoints/Fine_tuning_models-3B/output-0.5
  checkpoint_files: [
    meta_model_0.pt,
  ]
  output_dir: /tmp/Llama-2-7b-hf/
  model_type: LLAMA2

device: cuda
dtype: bf16

seed: 1234

# Tokenizer arguments
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /home/toolkit/scratch/LLMcode/Train/llama-3.2-3B/original/tokenizer.model
  max_seq_len: null

# Generation arguments; defaults taken from gpt-fast
prompt: "Hey, tell me a joke"
instruct_template: null
chat_format: null
max_new_tokens: 300
temperature: 0.0 # 0.8 and 0.6 are popular values to try
top_k: 300

enable_kv_cache: False

quantizer: null

The text was updated successfully, but these errors were encountered:

SalmanMohammadi · 2024-10-03T14:47:59Z

Hey @Vattikondadheeraj. One quick thing to try, could you change model_type: LLAMA3?

I'd also reccomend giving our dev/generate_v2 recipe out, which we'll be switching to as our default generation recipe soon. There's an example config in recipes/configs/llama2/generation_v2.yaml.

Vattikondadheeraj · 2024-10-03T17:06:53Z

Hey @SalmanMohammadi , I tested on generate_v2 file but I am getting another error which is related to missing thing in config file I guess. I attached the error below

Traceback (most recent call last):
  File "/home/toolkit/.conda/envs/torch/bin/tune", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/run.py", line 187, in _run_cmd
    self._run_single_device(args)
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/_cli/run.py", line 96, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "generate_v2.py", line 247, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/toolkit/scratch/LLMcode/Train/torchtune-2/torchtune/torchtune/config/_parse.py", line 99, in wrapper
    sys.exit(recipe_main(conf))
             ^^^^^^^^^^^^^^^^^
  File "generate_v2.py", line 235, in main
    recipe = InferenceRecipe(cfg=cfg)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "generate_v2.py", line 78, in __init__
    self._logger = utils.get_logger(cfg.log_level)
                                    ^^^^^^^^^^^^^
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/dictconfig.py", line 355, in __getattr__
    self._format_and_raise(
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/base.py", line 231, in _format_and_raise
    format_and_raise(
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/_utils.py", line 899, in format_and_raise
    _raise(ex, cause)
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/_utils.py", line 797, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set env var OC_CAUSE=1 for full trace
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__
    return self._get_impl(
           ^^^^^^^^^^^^^^^
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/dictconfig.py", line 442, in _get_impl
    node = self._get_child(
           ^^^^^^^^^^^^^^^^
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/basecontainer.py", line 73, in _get_child
    child = self._get_node(
            ^^^^^^^^^^^^^^^
  File "/home/toolkit/.conda/envs/torch/lib/python3.11/site-packages/omegaconf/dictconfig.py", line 480, in _get_node
    raise ConfigKeyError(f"Missing key {key!s}")
omegaconf.errors.ConfigAttributeError: Missing key log_level
    full_key: log_level
    object_type=dict

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while performing Inference with llama-3.2 3B checkpoint #1749

Error while performing Inference with llama-3.2 3B checkpoint #1749

Vattikondadheeraj commented Oct 3, 2024

SalmanMohammadi commented Oct 3, 2024

Vattikondadheeraj commented Oct 3, 2024

Error while performing Inference with llama-3.2 3B checkpoint #1749

Error while performing Inference with llama-3.2 3B checkpoint #1749

Comments

Vattikondadheeraj commented Oct 3, 2024

SalmanMohammadi commented Oct 3, 2024

Vattikondadheeraj commented Oct 3, 2024