Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with ImageNet 64x64 #86

Open
gulperii opened this issue May 21, 2021 · 1 comment
Open

Training with ImageNet 64x64 #86

gulperii opened this issue May 21, 2021 · 1 comment

Comments

@gulperii
Copy link

Hello,

I am using ImageNet 64x64 and run the code with the following command :
python train.py --dataset I64_hdf5 --shuffle --batch_size 128 --num_G_accumulations 1 --num_D_accumulations 1 --num_D_steps 1 --G_lr 1e-4 --D_lr 4e-4 --D_B2 0.999 --G_B2 0.999 --G_attn 32 --D_attn 32 --G_nl relu --D_nl relu --SN_eps 1e-8 --BN_eps 1e-5 --adam_eps 1e-8 --G_ortho 0.0 --G_init xavier --D_init xavier --G_eval_mode --G_ch 32 --D_ch 32 --ema --use_ema --ema_start 2000 --test_every 5000 --save_every 1000 --num_best_copies 5 --num_save_copies 2 --seed 0 --which_best FID --num_epochs 1000 --num_workers 8 --parallel

and getting this error:

File "train.py", line 229, in <module>
    main()
  File "train.py", line 226, in main
    run(config)
  File "train.py", line 184, in run
    metrics = train(x, y)
  File "/BigGAN-PyTorch/train_fns.py", line 42, in train
    split_D=config['split_D'])
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 140, in forward
    return self.module(*inputs, **kwargs)
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/BigGAN-PyTorch/BigGAN.py", line 443, in forward
    D_out = self.D(D_input, D_class)
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/BigGAN-PyTorch/BigGAN.py", line 403, in forward
    out = out + torch.sum(self.embed(y) * h, 1, keepdim=True)
RuntimeError: CUDA error: device-side assert triggered

I have used the prepare_data script in the repository as follows:

python make_hdf5.py --dataset I64 --batch_size 256 --data_root data
python calculate_inception_moments.py --dataset I64_hdf5 --data_root data

The interesting thing is when I create a "mini dataset" by randomly selecting 500 images per label from original ImageNet dataset code runs fine. What could be the problem? How can I solve this issue ?

@a28293971
Copy link

CUDA error: device-side assert triggered such ERR, it is best to transfer the model to the CPU to see the detailed ERR message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants