-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training instructions from README.md not working for me #104
Comments
@davidmartinrius The problem is try to train the model on 3090/4090 or A100. |
@lifeiteng thanks for your response. Your response is not clear for me. Maybe the batch size is lower because of the vram, and it means that needs more iterations, but it should not affect the performance. I'm sorry but your response is not useful for me. It should be able to train it in almost any Nvidia RTX > 3000 series GPU... Please, if you really think that the max duration is the problem, can you explain how to adapt it to a 10GB GPU?... |
@davidmartinrius |
I agree with you in this point. I understand that when the batch size is too small, the gradients computed from the batch may not be representative of the overall structure of the dataset, leading to unstable and slow convergence during training. Said that, do you think is it possible to make it work adjusting the gradient accumulation, the learning rate, batch normalization or even adding more layers? Actually I don't know the whole project, maybe you could valuate it. If there is a way to optimize it I would like to try it. I know it means more training hours and more development. Thank you! David Martin Rius |
Hello,
I am working with an Ubuntu 22, a NVIDIA RTX 3080, 64GB RAM
I followed the steps of the DEMO in the README.md to train a model of LibriTTS.
The result after the inference is wrong. It sounds like a weird noise. I attached the wav inside a zip because github does not allow to upload a wav.
0.zip
I ran the inference like in the instructions:
Please, can you help me to understand what am I doing wrong?
Ask me for any information you need to analyze, I will provide it.
When training I had to change the parameter --max-duration to prevent out of memory error.
In AR model I changed --max-duration to 20
In NAR model I changed --max-duration to 15
In both cases I had to remove "--valid-interval 20000" because this parameter is not recognized by bin/trainer.py
Thank you,
David Martin Rius
The text was updated successfully, but these errors were encountered: