You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to train the model for a male dataset. I've tried training from scratch and finetuning the provided checkpoint. I tried with the default parameters (batchsize 3 - 8GPUs) and increasing batch size to 32 on 8 GPUs and playing around with the lr. In all cases, the error saturates to -5 around 5k-20k steps and then either increases or blows up. Do you have any suggestions what to do in this case? Have you trained the model for any dataset other than LJ?
Examples of training loss curves:
The text was updated successfully, but these errors were encountered:
I have been trying to train the model for a male dataset. I've tried training from scratch and finetuning the provided checkpoint. I tried with the default parameters (batchsize 3 - 8GPUs) and increasing batch size to 32 on 8 GPUs and playing around with the lr. In all cases, the error saturates to -5 around 5k-20k steps and then either increases or blows up. Do you have any suggestions what to do in this case? Have you trained the model for any dataset other than LJ?
Examples of training loss curves:
The text was updated successfully, but these errors were encountered: