Skip to content

Commit

Permalink
update readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
soobin.suh committed Mar 20, 2019
1 parent f9784d2 commit 7666647
Show file tree
Hide file tree
Showing 46 changed files with 24 additions and 5 deletions.
23 changes: 21 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Transformer-TTS
A Pytorch Implementation of [Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895)
This model can be trained about 3 to 4 times faster than the well known seq2seq model like tacotron, and the quality of synthesized speech is almost the same. It was confirmed through experiment that it took about 0.5 second per step.
I did not use the wavenet vocoder but learned the post network using CBHG model of tacotron and converted the spectrogram into raw wave using griffin-lim algorithm.

<img src="png/model.png">

Expand All @@ -14,12 +16,29 @@ A Pytorch Implementation of [Neural Speech Synthesis with Transformer Network](h
## Data
I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded [here](https://keithito.com/LJ-Speech-Dataset/). I referred https://github.com/keithito/tacotron and https://github.com/Kyubyong/dc_tts for the preprocessing code.
## Attention images
## Attention plots
A diagonal alignment appeared after about 15k steps. The attention plots below are at 160k steps.
## Learning curves
### Self Attention encoder
<img src="png/attention_encoder.gif">
### Self Attention decoder
<img src="png/attention_decoder.gif">
### Attention encoder-decoder
<img src="png/attention.gif">
## Learning curves & Alphas
<img src="png/training_loss.png">
I used Noam style warmup and decay as same as [Tacotron](https://github.com/Kyubyong/tacotron)
<img src="png/alphas.png">
The alpha value for the scaled position encoding is different from the thesis. In the paper, the alpha value of the encoder is increased to 4, whereas in the present experiment, it slightly increased at the beginning and then decreased continuously. The decoder alpha has steadily decreased since the beginning.
## Experimental notes
## Generated Samples
## File description
* `hyperparams.py` includes all hyper parameters that are needed.
* `prepare_data.py` preprocess wav files to mel, linear spectrogram and save them for faster training time. Preprocessing codes for text is in text/ directory.
Expand Down
Binary file added png/alphas.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_0_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_0_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_0_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_0_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_1_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_1_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_1_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_1_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_2_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_2_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_2_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention/attention_2_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_0_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_0_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_0_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_0_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_1_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_1_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_1_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_1_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_2_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_2_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added png/attention_decoder/attention_dec_2_2.png
Binary file added png/attention_decoder/attention_dec_2_3.png
Binary file added png/attention_encoder.gif
Binary file added png/attention_encoder/attention_enc_0_0.png
Binary file added png/attention_encoder/attention_enc_0_1.png
Binary file added png/attention_encoder/attention_enc_0_2.png
Binary file added png/attention_encoder/attention_enc_0_3.png
Binary file added png/attention_encoder/attention_enc_1_0.png
Binary file added png/attention_encoder/attention_enc_1_1.png
Binary file added png/attention_encoder/attention_enc_1_2.png
Binary file added png/attention_encoder/attention_enc_1_3.png
Binary file added png/attention_encoder/attention_enc_2_0.png
Binary file added png/attention_encoder/attention_enc_2_1.png
Binary file added png/attention_encoder/attention_enc_2_2.png
Binary file added png/attention_encoder/attention_enc_2_3.png
Binary file added png/mel_original.png
Binary file added png/mel_pred.png
Binary file added png/training_loss.png
Binary file modified samples/test.wav
Binary file not shown.
6 changes: 3 additions & 3 deletions synthesis.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ def synthesis(text):
m = Model()
m_post = ModelPostNet()

m.load_state_dict(load_checkpoint(120000, "transformer"))
m_post.load_state_dict(load_checkpoint(80000, "postnet"))
m.load_state_dict(load_checkpoint(160000, "transformer"))
m_post.load_state_dict(load_checkpoint(100000, "postnet"))

max_len = 400

Expand Down Expand Up @@ -51,4 +51,4 @@ def synthesis(text):
write(hp.sample_path + "/test.wav", hp.sr, wav)

if __name__ == '__main__':
synthesis("My name is Soobin Suh.")
synthesis("This experiment was so difficult.")

0 comments on commit 7666647

Please sign in to comment.