Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why I just generate a piece of noisy audio #218

Closed
wxy656 opened this issue Feb 8, 2017 · 4 comments
Closed

why I just generate a piece of noisy audio #218

wxy656 opened this issue Feb 8, 2017 · 4 comments

Comments

@wxy656
Copy link

wxy656 commented Feb 8, 2017

Is there any function to fast the training speed. my every steps would cost me 40s. So I have just trained for 1000 steps and loss value was 3.1 . Then I ran generate.py file ,It just generated a piece of noisy audio.
So how many steps should it train for and what should the loss value be ? Could anyone share his experience . It will be great If who can upload a trained file

@ianni67
Copy link

ianni67 commented Feb 8, 2017

Looks like wavenet is not meant for cpu-only computers. On a well-equipped box, with Titan-x GPU, a step takes about 0.8 seconds. And a full training took me two days on a few seconds audio file.

@abhilashi
Copy link

I'm using a p2.16xlarge on EC2. It's taking about twelve seconds per step. It claims the following spec:

"High Frequency Intel Xeon E5-2686v4 (Broadwell) Processors
High-performance NVIDIA K80 GPUs, each with 2,496 parallel processing cores and 12GiB of GPU memory"

p2.16xlarge has 16 GPUs, 64 vCPUs, 732 GB memory and 192 GB GPU memory.

I'm using a ten second audio file. Are there any parameters I should be tweaking to achieve the speed up?

I'm currently using default:

{
    "filter_width": 2,
    "sample_rate": 16000,
    "dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512],
    "residual_channels": 32,
    "dilation_channels": 32,
    "quantization_channels": 256,
    "skip_channels": 512,
    "use_biases": true,
    "scalar_input": false,
    "initial_filter_width": 32
}

@ianni67 any suggestions how to speed it up?

@akademi4eg
Copy link
Collaborator

@abhilashi On p2.xlarge you'd get around 1.5-2 sec per step. There is a pull request #169 that implements multi GPU support. It is marked as "work in progress", yet you can try it. Currently it supports unconditioned version, yet if it would work, we can try updating and merging it.

@akademi4eg
Copy link
Collaborator

@wxy656 To get reasonable sounds on VCTK you need at least 40-50k iterations. I managed to get it on CPU, but it takes a lot of time and sounds were still rather noisy. So GPU is a way to go.
Also I think training on 2-3 files might result in speech-like sounds after less iterations. Your target loss should be not higher than 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants