Simplified Voice-Clone

English| 中文简体 |

1. 简介

本项目对 GPT-SoVITS 、FishSpeech、ChatTTS进行精简，允许用户使用python代码进行简单地模型推理、训练

2. 安装

创建虚拟环境

conda create -n gpt_sovits python=3.8
conda activate gpt_sovits

安装torch

pip install torch torchvision torchaudio

安装ffmpeg
```
conda install ffmpeg
```

拉取项目并安装依赖

git clone https://github.com/HanxSmile/Simplify-GPT-SoVITS.git
cd Simplify-GPT-SoVITS
pip install .

验证是否安装成功

python -c "from gpt_sovits import Factory"

3. few-shot 模型推理

.1 GPT-SoVITS

下载预训练模型（可以参考原作者项目 gpt-sovits）
```
git lfs clone https://huggingface.co/lj1995/GPT-SoVITS
```

下载中文g2p模型并解压

wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip
unzip G2PWModel_1.1.zip -d ./

修改模型配置，将上面下载的模型的路径填写到模型配置的相应位置

config/gpt_sovits.yaml:

model_cls: gpt_sovits

hubert_model_name: GPT-SoVITS/chinese-hubert-base
bert_model_name: GPT-SoVITS/chinese-roberta-wwm-ext-large
t2s_model_name: GPT-SoVITS/gsv-v2final-pretrained/s1bert25hz-5kh-longer-epoch=12-step=369668.ckpt
vits_model_name: GPT-SoVITS/gsv-v2final-pretrained/s2G2333k.pth
cut_method: cut6
text_converter:
  converter_cls: chinese_converter
  g2p_model_dir: G2PWModel_1.1
  g2p_tokenizer_dir: GPT-SoVITS/chinese-roberta-wwm-ext-large

generate_cfg:
  placeholder: Null

必须修改的字段：

字段解释

hubert_model_name hubert模型的路径

bert_model_name bert模型的路径

t2s_model_name AR模型的路径

vits_model_name vits模型的路径

text_converter.g2p_model_dir g2p模型的路径

text_converter.g2p_tokenizer_dir g2p tokenizer 的目录（和bert_model_name一致）

可以修改的字段：

字段解释

cut_method 切分长句的方式（建议使用cut6，即按「，。？！...」切分）

收集参考音频文件与相应的文本内容

模型few-shot推理

from gpt_sovits import Factory
from gpt_sovits.utils import save_audio
import os
import uuid

cfg = Factory.read_config("config/gpt_sovits.yaml")
model = Factory.build_model(cfg)

inputs = {
    "prompt_audio": "examples/linghua_90.wav",
    "prompt_text": "藏明刀的刀工,也被算作是本領通神的神士相關人員,歸屬統籌文化、藝術、祭祀的射鳳形意派管理。",
    "text": "明月几时有，把酒问青天"
}
model = model.cuda()
sr, audio_data = model.generate(inputs)

name = uuid.uuid4().hex
output_dir = os.getcwd()
output_file = os.path.join(output_dir, name + '.wav')

output_file = save_audio(audio_data, sr, output_file)
print(output_file)

3.2 FishSpeech

下载预训练模型（可以参考原作者项目FishSpeech）
```
git lfs clone https://huggingface.co/fishaudio/fish-speech-1.4
```

修改模型配置，将上面下载的模型的路径填写到模型配置的相应位置

config/fishspeech.yaml:

model_cls: fish_speech
cut_method: cut6
vqgan:
  model_cls: filefly_vqgan
  ckpt: fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth
  spec_transform:
    sample_rate: 44100
    n_mels: 160
    n_fft: 2048
    hop_length: 512
    win_length: 2048
  backbone:
    input_channels: 160
    depths: [ 3, 3, 9, 3 ]
    dims: [ 128, 256, 384, 512 ]
    drop_path_rate: 0.2
    kernel_size: 7
  head:
    hop_length: 512
    upsample_rates: [ 8, 8, 2, 2, 2 ]
    upsample_kernel_sizes: [ 16, 16, 4, 4, 4 ]
    resblock_kernel_sizes: [ 3, 7, 11 ]
    resblock_dilation_sizes: [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ]
    num_mels: 512
    upsample_initial_channel: 512
    pre_conv_kernel_size: 13
    post_conv_kernel_size: 13
  quantizer:
    input_dim: 512
    n_groups: 8
    n_codebooks: 1
    levels: [ 8, 5, 5, 5 ]
    downsample_factor: [ 2, 2 ]

text2semantic:
  model_cls: dual_ar_transformer
  tokenizer_name: fish-speech-1.4/
  ckpt: fish-speech-1.4/model.pth
  model:
    attention_qkv_bias: False
    codebook_size: 1024
    dim: 1024
    dropout: 0.1
    head_dim: 64
    initializer_range: 0.02
    intermediate_size: 4096
    max_seq_len: 4096
    n_fast_layer: 4
    n_head: 16
    n_layer: 24
    n_local_heads: 2
    norm_eps: 1e-6
    num_codebooks: 8
    rope_base: 1e6
    tie_word_embeddings: False
    use_gradient_checkpointing: True
    vocab_size: 32000

text_converter:
  converter_cls: chinese_fs_converter

必须修改的字段：

字段解释

vqgan.ckpt vqgan模型的路径

text2semantic.ckpt text2semantic模型的路径

text2semantic.tokenizer_name text2semantic模型使用的tokenizer的所在目录

可以修改的字段：

字段解释

cut_method 切分长句的方式（建议使用cut6，即按「，。？！...」切分）

收集参考音频文件与相应的文本内容

模型few-shot推理

from gpt_sovits import Factory
from gpt_sovits.utils import save_audio
import os
import uuid

cfg = Factory.read_config("config/fishspeech.yaml")
model = Factory.build_model(cfg)

inputs = {
    "prompt_audio": "examples/linghua_90.wav",
    "prompt_text": "藏明刀的刀工,也被算作是本領通神的神士相關人員,歸屬統籌文化、藝術、祭祀的射鳳形意派管理。",
    "text": "明月几时有，把酒问青天"
}
model = model.cuda()
sr, audio_data = model.generate(inputs)

name = uuid.uuid4().hex
output_dir = os.getcwd()
output_file = os.path.join(output_dir, name + '.wav')

output_file = save_audio(audio_data, sr, output_file)
print(output_file)

4. Gradio Demo

step 1：下载预训练模型（可参考上文）

step 2：准备配置文件，把预训练模型的路径放在配置文件的对应位置（可参考上文），将所有的配置文件放在项目的config目录下

step 3：在项目目录下运行：python webui.py

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
config		config
docs		docs
examples		examples
gpt_sovits		gpt_sovits
webui_utils		webui_utils
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
setup.py		setup.py
webui.py		webui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplified Voice-Clone

1. 简介

2. 安装

3. few-shot 模型推理

.1 GPT-SoVITS

3.2 FishSpeech

4. Gradio Demo

Todo List

参考项目

About

Releases

Packages

Languages

字段	解释
`hubert_model_name`	hubert模型的路径
`bert_model_name`	bert模型的路径
`t2s_model_name`	AR模型的路径
`vits_model_name`	vits模型的路径
`text_converter.g2p_model_dir`	g2p模型的路径
`text_converter.g2p_tokenizer_dir`	g2p tokenizer 的目录（和bert_model_name一致）

字段	解释
`vqgan.ckpt`	vqgan模型的路径
`text2semantic.ckpt`	text2semantic模型的路径
`text2semantic.tokenizer_name`	text2semantic模型使用的tokenizer的所在目录

HanxSmile/Simplify-GPT-SoVITS

Folders and files

Latest commit

History

Repository files navigation

Simplified Voice-Clone

1. 简介

2. 安装

3. few-shot 模型推理

.1 GPT-SoVITS

3.2 FishSpeech

4. Gradio Demo

Todo List

参考项目

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages