Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer learning for another sampling rate? #164

Open
cduguet opened this issue Oct 25, 2019 · 11 comments
Open

Transfer learning for another sampling rate? #164

cduguet opened this issue Oct 25, 2019 · 11 comments

Comments

@cduguet
Copy link

cduguet commented Oct 25, 2019

Hi!
I'd like to use the pretrained weights in waveglow, to train on a dataset with different sampling rate. When I just train tacotron and try the mel outputs on the pretrained waveglow model the audio outputs sound low-pitched.
If the frequency is fundamentally different, does it bring any benefit using the pretrained network or it would be as useful as training from scratch?
Any experiences in this??
My dataset sampling rate is 16000, in contrast to 22500 from the original LJSpeech dataset.

@rafaelvalle
Copy link
Contributor

Did you change the sampling rate on the API that saves/playbacks the audio?

@cduguet
Copy link
Author

cduguet commented Nov 7, 2019

Yes I did change the playback api. I did train my own Waveglow network in the meantime as well as Tacotron. Both trained with my 16KHz german dataset.

When I do inference with these it sounds like this:
https://vocaroo.com/i/s0W76vXWCsTh

When I do inference with the pretrained waveglow_256channels.pt, at a sampling rate of 22.5KHz:
https://vocaroo.com/i/s0G4XUSjcgSi

When I do inference with the pretrained waveglow_256channels.pt, at a sampling rate of 16KHz:
https://vocaroo.com/i/s1lIjBBGPbRM

I can either match the pitch, or the speed, but not both.

@rafaelvalle
Copy link
Contributor

rafaelvalle commented Nov 7, 2019

When training waveglow, did you change the sampling rate in your config file?

"sampling_rate": 22050,

@cduguet
Copy link
Author

cduguet commented Nov 8, 2019

When I trained from scratch, yes I did. But I wanted to know what if I did not train from scratch.

I wanted to know if a pretrained waveglow with sampling 22050 helps to train further with a dataset of sampling rate 16000, or would they be incompatible? Would just setting the sampling rate do the trick?

@Islanna
Copy link

Islanna commented Nov 26, 2019

Hi, @cduguet !
Your idea looks really promising. Proper training from scratch takes too much time, probably transfer learning can solve this problem.

Have you already run any experiments of training from the pretrained model on a dataset with sr=16KHz?

@patrick-g-zhang
Copy link

have you changed the hope length either? Because in wave glow source code, the mel-spec will be upsampled as the same length as audio with transposed convolution with fixed stride 256.

@Shikherneo2
Copy link

Hey @patrick-g-zhang
Do you mean change the hop length to something like 200 for a Sampling rate of 16KHz?

@patrick-g-zhang
Copy link

@Shikherneo2 Yes. in original google's paper, the hop time of stft is 12.5ms which should be 200 sample points when sampling rate is 16kHz.

@patrick-g-zhang
Copy link

I have done it with 16k training data which is per-trained with 22k provided model. The loss dropped quickly and generate audible audio.

@EuphoriaCelestial
Copy link

When I trained from scratch, yes I did. But I wanted to know what if I did not train from scratch.

I wanted to know if a pretrained waveglow with sampling 22050 helps to train further with a dataset of sampling rate 16000, or would they be incompatible? Would just setting the sampling rate do the trick?

@cduguet I am trying to train from a new waveglow model scratch too, with sample_rate=8000 (maybe I should increase to 16k since 8k sound very bad), what do I need to change in config.json for new sample rate?

@ashish-roopan
Copy link

try this #88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants