Waveglow as Inverse STFT function #244

ajinkyakulkarni14 · 2021-01-05T14:18:48Z

Hello

Currently, I am trying to train waveglow model from scratch to implement the Inverse STFT function. I am using 20K samples of noise+speech to train the system. I am attaching the configuration of waveglow model I am training. After 420K iteration, I synthesized the audio waveform for given input as STFT. The obtained results have a whistling sound in it, Can anyone suggest to me, approx for how many iterations I should train the model ? and if 20K number of samples are sufficient to train the system? and any other guidelines to improve the model.

Thanks

{
"train_config": {
"fp16_run": false,
"output_directory": "checkpoints",
"epochs": 100000,
"learning_rate": 1e-4,
"sigma": 1.0,
"iters_per_checkpoint": 20000,
"batch_size": 1,
"seed": 1234,
"checkpoint_path": "",
"with_tensorboard": false
},
"data_config": {
"training_files": "train_list.txt",
"segment_length": 16000,
"sampling_rate": 16000,
"filter_length": 511,
"hop_length": 256,
"win_length": 511,
"mel_fmin": 0.0,
"mel_fmax": 8000.0
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},

"waveglow_config": {
    "n_mel_channels": 256,
    "n_flows": 12,
    "n_group": 8,
    "n_early_every": 4,
    "n_early_size": 2,
    "WN_config": {
        "n_layers": 8,
        "n_channels": 256,
        "kernel_size": 3
    }
}

}

The text was updated successfully, but these errors were encountered:

rafaelvalle · 2021-01-05T18:34:52Z

Can you share a couple audio samples and your loss curves for training and validation?

ajinkyakulkarni14 · 2021-01-06T09:15:36Z

Please check the materials given below,

I observed that on iteration number 345484: the loss suddenly increased to 2059746816.000000000, which explains the spike. I also tried to plot loss for shorter range of iterations and observed that there are spikes in between.

Furthermore, I am attaching the code for training and dataloader for stft based model for your reference.

The main purpose of creating Inverse STFT based waveglow is to use it as a pretrained model to train it further in the context of speech enhancement.

Can you suggest what should I do to optimize well the model?

spec2samp.txt