Speech synthesis model repo for galgame characters based on Tacotron2 , Hifigan and VITS. The repo is also used to publish precompiled GUI.
Notice : This project is only for AI study and hobby, because of the like of the characters to develop this project, no malicious purpose. If there is infringement, please submit issues, we will immediately remove the associated model.
Speech synthesis model repo for galgame characters based on Tacotron2, Hifigan and VITS. The repo is also used to publish precompiled GUI.
Archive Notice: The GUI is relatively complete and this project will not be maintained in the future.
This readme is up to version 1.2.5, version 1.3.0 is only available in Chinese.
1.2.5:
- Improve diff svc code.
- Simplify VITS and Tacotron2 code and remove matplotlib, tensorflow dependencies.
- Remove unnecessary imports and dependencies.
- Crepe can toggle between full and tiny models.
- Added nuitka compiled distribution.
- Modified the resampling type used by diffsvc when processing audio sample rates and remove numba dependency. (For nuitka to work.)
A standalone language file and related settings will be available later to support other languages.
1.2.4:
- Add recent use history.
- Add infer finish notice
- Update diff-svc (code version:12-04)
- Packed Openvpi's nsf_hifigan weights. (https://openvpi.github.io/vocoders/)
1.2.2-beta:
- fix: Some devices are missing dlls and cannot run.
- update: support diff-svc.
GPU version is coming soon.(code only)
On the basis of the open source agreement please also comply with the following rules.
- (Important) Do not use this software, the pre-trained models provided by this repository or the speech synthesis results for direct commercial purposes (e.g. QQ bots with paid features, direct sales, commercial games, etc.) derivative work is not included.
- Please comply with the user agreement of the original work and do not create content that will adversely affect the original work.
- The pre-trained models and datasets provided by this repository are partially from the community, and all consequences caused by their use are borne by the users, not the authors and contributors of this repository.
- The use of any content of this repository (including but not limited to code, compiled exe, pre-trained models) for original game development is prohibited.
Attitude of this project:
- This project only encourages derivative work of the original work within reasonable limits, and is against any inappropriate behavior such as insulting the original work and related industries.
-
If the model does not have a configuration file, you can place it in any directory, if the model has a configuration file, rename it to
config.json
and place it in the same directory as the model. -
(Notes on TTS model) Since version 1.2.0, the text module and cleaners from the original project have been deprecated. So, you need to write the symbols used by your model to the moetts configuration file. (If you don't know how to do this step, you can refer to the pre-trained models given)
Example (moetts.json):
{ "symbols":["_", ",", ".", "!", "?", "-", "A", "E", "I", "N", "O", "Q", "U", "a", "b", "d", "e", "f", "g", "h", "i", "j", "k", "m", "n", "o", "p", "r", "s", "t", "u", "v", "w", "y", "z", "\u0283", "\u02a7", "\u2193", "\u2191", " "] }
Mostly you need to input phoneme (In Japanese, it stands for the Roman Tones), but the author of the models can decide other input methods. For example, the ATRI model(Tacotron2 version) supports Roman Tones without spaces only and the commas and the periods only.
After opened the software, please choose your model's path and output path. At last, type the text need to be speaked, click on the "合成语音" button, wait for a while, then the software will output the audio to the /{output_path}/outpus.wav
Notes:
- Because of the model loading, when generate voice for the first time, it may take a long time. However, the same model won't be reloaded when generate for the next time and will directly generate the voice.
- If you change the model, the software will reload it when generate voice next time.
- If you have modified cleaners and symbols, you will need to restart the software for them to take effect.
- The software supports amd64 only, instead of i386.
About VITS
- VITS-Single and VITS-Multi are single speaker model and multi speaker model respectively.
- the
原角色ID
in VITS-Multi is the ID of the speaker to be synthesized, you need to provide a number, the目标角色ID
is used as the target speaker id forvoice conversion
. - Using
语音迁移
(voice conversion) requires a wav file with a sample rate of 22050.
-
ToolBox Update
- Add Chinese g2p tool.
- Change pyopenjtalk to build-in and fixed Unicode error.
-
Settings Update
- Add batch processing mode
- Support use custom filename
- Support change VITS length-scale
This version may be unstable.
Setting Instructions:
- 升降半音(Transposition): int (Unit: semitone)
- 启用Crepe(Enable Crepe):Improved audio quality when enabled, but will take longer.
- 加速倍率(Acceleration ratio):Default is 20, higher values will infer faster, but may affect quality.
- 待转换音频(Input audio):wav or ogg file with vocals only.
- Crepe轻量模式(Crepe Tiny): With Crepe enabled, Crepe uses Tiny models when this option is checked and takes less time.
Integrated into Huggingface Spaces using Gradio. Try it out
Name&Download | Type | Info | Input format |
---|---|---|---|
ATRI | Tacotron2+Hifigan | ATRI - Character's model of My dear moments | Roman tone without spaces. For example: tozendesu.koseinodesukara. (You can also use the "jp g2p" tool in the toolbox to help convert text) |
ATRI-VITS | VITS Single | ATRI - Character's model of My dear moments | Japanese text converted by "jp g2p - 调形标注+替换ts" in the toolbox |
13 Galgame characters | VITS Multi | Speakers: 0:Takakura Anri, 1:Takakura Anzu, 2:Apeiria, 3:Kurashina Asuka, 4:ATRI, 5:ISLA, 6:Shindou Ayane, 7:Himeno Sena, 8:Komari Yui, 9:Miyohashi Koori, 10:Arisaka Mashiro, 11:Sirosaki Mieru, 12:Nikaidou Shinku | Japanese text converted by "jp g2p - 调形标注+替换ts" in the toolbox |
Mori | VITS Single | Mori - Character's model of Fox Hime Zero | Japanese text converted by "jp g2p - 普通转换" in the toolbox |
Name&Download | Notes | Trainer |
---|---|---|
Himeno Sena | 24000Hz(infer only) | luoyily |
Komari Yui | 24000Hz(infer only) | luoyily |
ATRI | 24000Hz(infer only) | RiceCake |
Takakura Anzu | 24000Hz(infer only) | luoyily |
Yune | 24000Hz(infer only) | luoyily |
Nishiki Asumi | 24000Hz(infer only) | luoyily |
Sirosaki mieru | 24000Hz(infer only) | luoyily |
Isla(Plastic Memories) | 24000Hz(infer only) | luoyily |
Illya | 24000Hz(infer only) | luoyily |
ATRI_44100 | 44100Hz(infer only) | RiceCake |
Niimi Sora | 44100Hz(infer only) | RiceCake |
Mori BaiduNetdisk hugging face | 44100Hz(infer only) | luoyily |
Kurashina Asuka BaiduNetdisk hugging face | 44100Hz(infer only+full weight) | luoyily |
Arihara Nanami | 44100Hz(infer only) | RiceCake |
Nikaidou Shinku | 44100Hz(infer only) | RiceCake |
Himeno Sena BaiduNetdisk hugging face | 44100Hz(infer only) | luoyily |
Sirosaki Mieru BaiduNetdisk hugging face | 44100Hz(infer only) | luoyily |
Note : Most of the above models include only the necessary inference weights, and you may not be able to continue training them directly.
-
Q: Can this GUI process non-official Tacotron2 models?
A:If the structure of the model and thinking method didn't changed, and the differences between the official and non-official is only the method when processing data, then it seems like to be mostly fine.
- ShiroDoMain: Developed the cli for version 1.0.0
- menproject: Translation of the English README for version 1.0.0
- CjangCjengh: Provides compiled g2p tools and symbol files suitable for Japanese tonal annotation.
- skytnt: hugging face online demo
hifi-gan: https://github.com/jik876/hifi-gan
Tacotron2: https://github.com/NVIDIA/tacotron2
VITS:https://github.com/jaywalnut310/vits
diff-svc:https://github.com/prophesier/diff-svc
DiffSinger:https://github.com/MoonInTheRiver/DiffSinger
DiffSinger(openvpi):https://github.com/openvpi/DiffSinger
DiffSinger Community Vocoder Project: https://openvpi.github.io/vocoders/
diff svc(openvpi): https://github.com/openvpi/diff-svc