Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status log #9

Open
Kyubyong opened this issue Nov 22, 2017 · 6 comments
Open

Status log #9

Kyubyong opened this issue Nov 22, 2017 · 6 comments

Comments

@Kyubyong
Copy link
Owner

22 Nov. 2017. Has completed the first draft. I've tested the current hyperparameters on only Nick dataset which is 8 hours long, but not on LJ which is 24 hours long. The results were not good, not terrible. As I tried with the same hyperparameters as the original paper with no success, I changed some of them. Amongst them are application of dilation and positional embedding instead of positional encoding. I found the attention plot of the last layer looks monotonic somewhat, but not clearly. I think the key signal that the network works is, of course, the attention plots.

@lsq357
Copy link

lsq357 commented Nov 28, 2017

Thank you for your great work!
Can you show me the environment such as python2 or python3, tensorflow version and so on?

@Kyubyong
Copy link
Owner Author

Sure. Python 2 TF 1.3 linux

@lsq357
Copy link

lsq357 commented Nov 28, 2017

Thanks!
I am using LJSpeech-1.0 dataset to train, Can you show me the alignment curve after convergence and how many steps I need to train only using Nick dataset.

@lsq357
Copy link

lsq357 commented Dec 7, 2017

Any plan to add `JOINT REPRESENTATION OF CHARACTERS AND PHONEMES' as the (deepvoice3,part 3.2) saying
image
and the keithito tacotron experiments showed faster convergence.

Also, as my experiments, JOINT REPRESENTATION OF CHARACTERS AND PHONEMES does coverge faster.

@arijit17
Copy link

arijit17 commented Feb 2, 2018

Do you have the synthesized speech files somewhere?

@chenxf0619
Copy link

hello, Kyubyong, we have pull your code, we test your code with LJ-speech data. we found the synthesized wav files has nothing to do with the content of the "test_sents.txt". Do you have any guide for us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants