generate_v2-51255444.err

  Running command git clone --quiet https://github.com/huggingface/transformers.git /tmp/pip-install-8ymk3egx/transformers_48cfc922af2040f8b4547ebe1ba2c81f
  Running command git clone --quiet https://github.com/huggingface/peft.git /tmp/pip-install-8ymk3egx/peft_782f692bd3894683a816e9286738ee65
  Running command git clone --quiet https://github.com/huggingface/accelerate.git /tmp/pip-install-8ymk3egx/accelerate_c78f2c58cb1949f9952d4f6bb3f4c973
/home/brc4cb/.conda/envs/falcon_40B/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/brc4cb/.conda/envs/falcon_40B/lib/libcudart.so'), PosixPath('/home/brc4cb/.conda/envs/falcon_40B/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]Loading checkpoint shards:  11%|█         | 1/9 [00:08<01:08,  8.55s/it]Loading checkpoint shards:  22%|██▏       | 2/9 [00:14<00:47,  6.76s/it]Loading checkpoint shards:  33%|███▎      | 3/9 [00:19<00:36,  6.11s/it]Loading checkpoint shards:  44%|████▍     | 4/9 [00:24<00:29,  5.85s/it]Loading checkpoint shards:  56%|█████▌    | 5/9 [00:30<00:22,  5.66s/it]Loading checkpoint shards:  67%|██████▋   | 6/9 [00:36<00:18,  6.04s/it]Loading checkpoint shards:  78%|███████▊  | 7/9 [00:42<00:11,  5.77s/it]Loading checkpoint shards:  89%|████████▉ | 8/9 [00:47<00:05,  5.63s/it]Loading checkpoint shards: 100%|██████████| 9/9 [00:51<00:00,  5.27s/it]Loading checkpoint shards: 100%|██████████| 9/9 [00:51<00:00,  5.77s/it]
/home/brc4cb/.conda/envs/falcon_40B/lib/python3.9/site-packages/transformers/generation/utils.py:1261: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
/home/brc4cb/.conda/envs/falcon_40B/lib/python3.9/site-packages/transformers/generation/utils.py:1355: UserWarning: Using `max_length`'s default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
Input length of input_ids is 43, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
/home/brc4cb/.conda/envs/falcon_40B/lib/python3.9/site-packages/transformers/generation/utils.py:1454: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for exampleinput _ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(