Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train a model based on synface dataset and result is not good as paper #15

Closed
diamond0910 opened this issue Aug 11, 2020 · 6 comments
Closed

Comments

@diamond0910
Copy link

Hi! Thank you very much for your excellent work!

I use the provided script python run.py --config experiments/train_celeba.yml --gpu 0 --num_workers 4 to train a model for the synface dataset.

And then I use python run.py --config experiments/test_celeba.yml --gpu 0 --num_workers 4 to test the model.

Finally, I got 0.0092±0.002 SIDE and 17.77±1.92 MAD, which is not good as in the paper(0.793 ±0.140 and 16.51 ±1.56 MAD in Table 2).

May I have a problem with my operation?

Thank you!

@elliottwu
Copy link
Owner

Hi,
I assume you were using experiments/train_synface.yml and experiments/test_synface.yml, but just to confirm. Also can you confirm that you are using exactly the same setting, ie, eg same batch size, same number epochs etc?
If that's true, it might also be related to the CUDA version (or pytorch version), as was reported in this thread. I have not tested this yet. It would also be helpful to share some visualization your results, and the environment you are using. Thanks!

@diamond0910
Copy link
Author

Thank you for your reply! I will show the phenomenon of my training for your reference.

Using the same code, I got 0.0079±0.0014 SIDE and 16.24±1.52 MAD using CUDA 9.0, and got 0.0092±0.0020 SIDE, 17.77±1.92 MAD using cuda 10.2.

This result surprised me. The performance of the same code differs so much under different versions of cuda. Is there any explanation here?

Thank you!

@Heng14
Copy link

Heng14 commented Dec 30, 2021

Hi, I think I met a similar problem. I use CUDA 11.4 and torch 1.9. I did not change anything in experiments/test_synface.yml. But it seems did not converge. MAD is at around 50 and SIDE is at around 0.2. Sometimes they even go to nan. I had tried many times and they all failed to converge. I am wondering if you have any hints on this problem. Thank you!

@diamond0910
Copy link
Author

I solve this problem by downgrading my cuda version to cuda 10.2. I guess there may be some function precision problem.

Best.

@Heng14
Copy link

Heng14 commented Dec 31, 2021

Thank you! I also use cuda 10.2 instead and the training process now converges.Hope the problem can be solved on cuda 11 in the future.

@diamond0910
Copy link
Author

Oh, i remember it did not work in cuda 11, so I used cuda 10.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants