-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuning error in conda environment without docker image #1538
Comments
It is resolved. In the yaml file change ${global_seed} to ${variables.global_seed} |
Hi, thanks for the issue! Happy to accept a PR fixing this if you like, otherwise we will update it! |
Hi, |
A follow-up on the task. I can see that the input id is of the form: id(Question): - id(Options) -\n\n... \n id(Answer): . And label contains -100 for all the entries followed by the id of tokens in true answer, and then all other entries are filled with id of . My question is how do we compare the performance of the model? Do we create similar input for each of the other options where Answer is followed by id of the other options and finally compare the log-likelihood of each of the these with the log-likelihood of the input with true answer. |
Hi, yes, the multiple choice ICL tasks in LLM Foundry do evaluation the way you described. |
Thanks for confirming that. Just a suggestion regarding the README in scripts/eval section. It will be a good to have a section where it is described how under the hood data processing happens when executing composer eval.py. The description present scripts/train/finetune_example/README.md about composer train.py is quite helpful. |
Wondering, does this readme help? Or still missing the information you are looking for? |
Environment
python 3.11.9
cuda 11.8
torch 2.4.0+cu118
PyTorch information
PyTorch version: 2.4.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.3
Libc version: glibc-2.31
Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-192-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100 80GB PCIe
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.4
/usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] onnx==1.16.2
[pip3] onnxruntime==1.19.0
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.4.0+cu118
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.4.0+cu118
[pip3] torchmetrics==1.4.0.post0
[pip3] torchvision==0.19.0+cu118
[pip3] triton==3.0.0
[conda] numpy 1.26.3 pypi_0 pypi
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch 2.4.0+cu118 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torchaudio 2.4.0+cu118 pypi_0 pypi
[conda] torchmetrics 1.4.0.post0 pypi_0 pypi
[conda] torchvision 0.19.0+cu118 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi
Composer information
Composer Version: 0.24.1
Composer Commit Hash: None
CPU Model: AMD EPYC 7542 32-Core Processor
CPU Count: 32
Number of Nodes: 1
GPU Model: NVIDIA A100 80GB PCIe
GPUs per Node: 1
GPU Count: 1
CUDA Device Count: 1
-->
To reproduce
Steps to reproduce the behavior:
1.pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118 --force-reinstall
2.pip install -e .
3.cd scripts/train
4. composer train.py finetune_example/gpt2-arc-easy--cpu.yaml
It gives the following error when run on cpu : omegaconf.errors.InterpolationKeyError: Interpolation key 'global_seed' not found
5. composer train.py finetune_example/mpt-7b-arc-easy--gpu.yaml
It gives the following error when run on gpu: ValueError: Unused parameters ['global_seed'] found in cfg. Please check your yaml to ensure these parameters are necessary. Please place any variables under the
variables
key.When run on gpu:
Expected behavior
The fine-tuning should work
Additional context
The text was updated successfully, but these errors were encountered: