Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Diffsinger multi-dictionary #1248

Merged
merged 3 commits into from
Sep 1, 2024

Conversation

oxygen-dioxide
Copy link
Contributor

@oxygen-dioxide oxygen-dioxide commented Aug 22, 2024

This PR add support for diffsinger multi-dictionary (openvpi/DiffSinger#203)

  • Support phonemes.json file, which is successor of phonemes.txt. (If the file name suffix is .json, it will be loaded as json format)
  • Support language embedding

Currently, in phonetic hint (phoneme list surronded by [] in the lyric of the note), you don't need to include language prefix, and the phonemizer will assume that it uses the language of the current phonemizer. (Of course, you can also include the language prefix. You can use phoneme from another language in this way)

However, if you edit the phonemes manually in the phoneme panel at the bottom, you have to input the whole phoneme name with the language prefix.

image

@oxygen-dioxide oxygen-dioxide marked this pull request as ready for review August 23, 2024 06:06
@stakira stakira merged commit 9d574ef into stakira:master Sep 1, 2024
3 checks passed
@oxygen-dioxide
Copy link
Contributor Author

DiffSinger多词典打包说明:
phonemes.json
phonemes.json的作用等同于之前的phonemes.txt。将dsconfig.yaml的phonemes一项设置为你的phonemes.json的文件名。phonemes.json文件会在导出onnx模型时一并导出。

phonemes: phonemes.json

OpenUtau根据文件后缀名判断使用何种方式加载此文件。如果文件后缀名为json,则OpenUtau会将其视为json格式进行加载。否则,仍然视为之前的txt格式进行加载。

use_lang_id
如果在训练时启用了use_lang_id,导出onnx时会一并导出一个languages.json,需要包含在音源中。

还需要在对应的dsconfig.yaml中添加以下内容:

use_lang_id: true
languages: languages.json #你的languages.json的文件名

每个使用dsconfig.yaml的模型都需要按以上方式进行打包。

@oxygen-dioxide
Copy link
Contributor Author

oxygen-dioxide commented Sep 7, 2024

DiffSinger multi-dictionary packaging instructions:
phonemes.json
The function of phonemes.json is equivalent to phonemes.txt. Set the phonemes field in dsconfig.yaml to the file name of your phonemes.json. The phonemes.json file will be exported together when exporting the onnx model.

phonemes: phonemes.json 

OpenUtau determines how to load this file based on the file extension. If it is ​​json, OpenUtau will load the file as json format. Otherwise, it will still be loaded as the previous txt format. So don't change the file extension of the file.

use_lang_id
If use_lang_id is enabled when training your voicebank, a languages.json file will be exported when exporting onnx. Put it in the same folder with dsconfig.yaml.

You also need to add the following content to dsconfig.yaml:

use_lang_id: true
languages: languages.json #The file name of your languages.json file

dsdict-**.yaml
In yaml dictionaries, you'll need to include language prefix in the phoneme symbols in replacements (to), symbols (symbol) and entries (phonemes) part.
Here is an example:

symbols:
- symbol: SP
  type: vowel
- symbol: AP
  type: vowel
- symbol: zh/a
  type: vowel
- symbol: zh/b
  type: stop
# ………
entries:
- grapheme: SP
  phonemes:
  - SP
- grapheme: AP
  phonemes:
  - AP
- grapheme: a
  phonemes:
  - zh/a
- grapheme: ba
  phonemes:
  - zh/b
  - zh/a
# ……

Do the things above for each dsconfig.yaml in your voicebank, if your voicebank is trained with multi-dictionary.

Also remember, only phonemizers (which reads dsdur folder) recognize per-lang dictionary, dsdict-**.yaml. In dspitch and dsvariance folder, only the dsdict.yaml is loaded and only the symbols part is used, and you need to ensure that all phonemes (with language prefix) your voicebank supports are included in these dsdict.yaml files. Because variance and pitch models have to know (and only have to know) whether a phoneme is vowel or consonant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants