configuration error #8

WalkerRusher · 2024-12-25T13:21:08Z

I try to operate the following scripts:
accelerate launch train_tokenizer.py
--exp_name bair_tokenizer_ft --output_dir log_vqgan --seed 0 --mixed_precision bf16
--model_type ctx_vqgan
--train_batch_size 16 --gradient_accumulation_steps 1 --disc_start 1000005
--oxe_data_mixes_type bair --resolution 64 --dataloader_num_workers 16
--rand_select --video_stepsize 1 --segment_horizon 16 --segment_length 8 --context_length 1
--pretrained_model_name_or_path pretrained_models/ivideogpt-oxe-64-act-free/tokenizer

However, an error occured:

File "/.local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1159, in launch_command
multi_gpu_launcher(args)
File "/local/lib/python3.9/site-packages/accelerate/commands/launch.py", line 769, in multi_gpu_launcher
import torch.distributed.run as distrib_run
File "/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 383, in
from torch.distributed.elastic.multiprocessing import Std
File "/.local/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/init.py", line 68, in
from torch.distributed.elastic.multiprocessing.api import ( # noqa: F401
File "/.local/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 26, in
from torch.distributed.elastic.multiprocessing.redirects import (
File "/.local/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/redirects.py", line 35, in
libc = get_libc()
File "/.local/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/redirects.py", line 32, in get_libc
return ctypes.CDLL("libc.so.6")
File "/usr/local/conda/lib/python3.9/ctypes/init.py", line 382, in init
self._handle = _dlopen(self._name, mode)
OSError: /usr/local/conda/lib/python3.9/site-packages/amp_C.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

before I operate the script, I install all the requirements as the repo mentioned (pip install -r requirements.txt).

I don't know why this happened. Could you please tell me your exact python version (3.9.x?)? Or any other suggestions would be deeply appreciated.

WalkerRusher · 2024-12-25T13:30:39Z

My environment:
python - 3.9.13
torch - 2.2.1+cu121
nvidia driver version: 470.103.01
cuda version: 12.2

Manchery · 2024-12-25T14:15:17Z

My environment: python - 3.9.13 torch - 2.2.1+cu121 nvidia driver version: 470.103.01 cuda version: 12.2

All the same as yours, except that Driver Version: 535.104.05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configuration error #8

configuration error #8

WalkerRusher commented Dec 25, 2024

WalkerRusher commented Dec 25, 2024

Manchery commented Dec 25, 2024

configuration error #8

configuration error #8

Comments

WalkerRusher commented Dec 25, 2024

WalkerRusher commented Dec 25, 2024

Manchery commented Dec 25, 2024