Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training instructions from README.md not working for me #104

Open
davidmartinrius opened this issue Apr 21, 2023 · 4 comments
Open

Training instructions from README.md not working for me #104

davidmartinrius opened this issue Apr 21, 2023 · 4 comments

Comments

@davidmartinrius
Copy link

davidmartinrius commented Apr 21, 2023

Hello,

I am working with an Ubuntu 22, a NVIDIA RTX 3080, 64GB RAM

I followed the steps of the DEMO in the README.md to train a model of LibriTTS.

image

The result after the inference is wrong. It sounds like a weird noise. I attached the wav inside a zip because github does not allow to upload a wav.

0.zip

I ran the inference like in the instructions:

python3 bin/infer.py --output-dir infer/demos \
    --model-name valle --norm-first true --add-prenet false \
    --share-embedding true --norm-first true --add-prenet false \
    --text-prompts "KNOT one point one five miles per hour." \
    --audio-prompts ./prompts/8463_294825_000043_000000.wav \
    --text "To get up and running quickly just follow the steps below." \
    --checkpoint=${exp_dir}/best-valid-loss.pt

Please, can you help me to understand what am I doing wrong?

Ask me for any information you need to analyze, I will provide it.

When training I had to change the parameter --max-duration to prevent out of memory error.

In AR model I changed --max-duration to 20
In NAR model I changed --max-duration to 15

In both cases I had to remove "--valid-interval 20000" because this parameter is not recognized by bin/trainer.py

Thank you,

David Martin Rius

@lifeiteng
Copy link
Owner

lifeiteng commented Apr 22, 2023

@davidmartinrius The problem is --max-duration to 20 which means the batch_size is in [1, 6].

try to train the model on 3090/4090 or A100.

@davidmartinrius
Copy link
Author

davidmartinrius commented Apr 22, 2023

@lifeiteng thanks for your response. Your response is not clear for me. Maybe the batch size is lower because of the vram, and it means that needs more iterations, but it should not affect the performance. I'm sorry but your response is not useful for me. It should be able to train it in almost any Nvidia RTX > 3000 series GPU...

Please, if you really think that the max duration is the problem, can you explain how to adapt it to a 10GB GPU?...

@lifeiteng
Copy link
Owner

@davidmartinrius
Small batch_size will not converge to a good local optimal point. It's common sense in DeepLearning.

@davidmartinrius
Copy link
Author

davidmartinrius commented Apr 23, 2023

I agree with you in this point. I understand that when the batch size is too small, the gradients computed from the batch may not be representative of the overall structure of the dataset, leading to unstable and slow convergence during training.

Said that, do you think is it possible to make it work adjusting the gradient accumulation, the learning rate, batch normalization or even adding more layers? Actually I don't know the whole project, maybe you could valuate it.

If there is a way to optimize it I would like to try it. I know it means more training hours and more development.

Thank you!

David Martin Rius

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants