Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waveglow as Inverse STFT function #244

Open
ajinkyakulkarni14 opened this issue Jan 5, 2021 · 2 comments
Open

Waveglow as Inverse STFT function #244

ajinkyakulkarni14 opened this issue Jan 5, 2021 · 2 comments

Comments

@ajinkyakulkarni14
Copy link

ajinkyakulkarni14 commented Jan 5, 2021

Hello

Currently, I am trying to train waveglow model from scratch to implement the Inverse STFT function. I am using 20K samples of noise+speech to train the system. I am attaching the configuration of waveglow model I am training. After 420K iteration, I synthesized the audio waveform for given input as STFT. The obtained results have a whistling sound in it, Can anyone suggest to me, approx for how many iterations I should train the model ? and if 20K number of samples are sufficient to train the system? and any other guidelines to improve the model.

Thanks

{
"train_config": {
"fp16_run": false,
"output_directory": "checkpoints",
"epochs": 100000,
"learning_rate": 1e-4,
"sigma": 1.0,
"iters_per_checkpoint": 20000,
"batch_size": 1,
"seed": 1234,
"checkpoint_path": "",
"with_tensorboard": false
},
"data_config": {
"training_files": "train_list.txt",
"segment_length": 16000,
"sampling_rate": 16000,
"filter_length": 511,
"hop_length": 256,
"win_length": 511,
"mel_fmin": 0.0,
"mel_fmax": 8000.0
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},

"waveglow_config": {
    "n_mel_channels": 256,
    "n_flows": 12,
    "n_group": 8,
    "n_early_every": 4,
    "n_early_size": 2,
    "WN_config": {
        "n_layers": 8,
        "n_channels": 256,
        "kernel_size": 3
    }
}

}

@rafaelvalle
Copy link
Contributor

rafaelvalle commented Jan 5, 2021

Can you share a couple audio samples and your loss curves for training and validation?

@ajinkyakulkarni14
Copy link
Author

ajinkyakulkarni14 commented Jan 6, 2021

Please check the materials given below,

I observed that on iteration number 345484: the loss suddenly increased to 2059746816.000000000, which explains the spike. I also tried to plot loss for shorter range of iterations and observed that there are spikes in between.

Furthermore, I am attaching the code for training and dataloader for stft based model for your reference.

The main purpose of creating Inverse STFT based waveglow is to use it as a pretrained model to train it further in the context of speech enhancement.

Can you suggest what should I do to optimize well the model?

shorter_range_loss_plot
spec2samp.txt

samples_orignal_and_waveglow.zip
loss_log_waveglow_istft.txt
waveglow_istft_lossplot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants