Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Guide, Help] How to install and use on Ubuntu/Debian Linux #119

Open
Turbine1991 opened this issue Jul 26, 2023 · 1 comment
Open

[Guide, Help] How to install and use on Ubuntu/Debian Linux #119

Turbine1991 opened this issue Jul 26, 2023 · 1 comment

Comments

@Turbine1991
Copy link

Turbine1991 commented Jul 26, 2023

There are two repos, this one and a fork with a working/better UI. They both function much the same. So we'll use the fork.

I found tortoise to be unreliable on Windows, including voice training. So I only use Linux for tortoise.

Instructions

1) Clone repo & change directory into it

git clone --depth=1 https://github.com/Acephalia/tortoise-tts-fast-GUI.git && cd tortoise-tts-fast-GUI

2) Install python 3.10 (if it doesn't exist, find a repository/PPA)

sudo apt -y install python3.10 python3.10-dev python3.10-venv

3) Setup virtual environment (ensures packages & versions only exist within this project)

python3.10 -m venv venv
echo "source venv/bin/activate" > activate

4) Start virtual environment (use this every time you do anything with pip/python packages)

source activate
Info: You may leave the virtual environment by writing 'deactivate'

5) Install tortoise dependencies

Torch (CUDA edition)
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 --no-cache-dir
Makes Vocoder magically work
pip3 install -e .
BigVGAN (what tortise-fast uses to generate synthesizes high-fidelity waveforms)
pip3 install git+https://github.com/152334H/BigVGAN.git
Ensure this package is not installed, we source it locally instead
pip3 uninstall tortoise

6) Download models and such

echo "Write your text here, the instructions are wrong" | ./scripts/tortoise_tts.py --voice emma --seed 42

This is where it appears to get stuck "Downloading the main structure of voicefixer", but in reality it's downloading over 600MB of data at the slowest speeds imaginable. It may take an hour. You can try a download manager like FDM and try the method below to manually download.

If you cancel your download, delete this folder to try again
rm -rf ~/.cache/voicefixer

Faster method to obtain the files above using a download manager (FDM) or mirror (WIP)

Source: https://zenodo.org/record/5469951/files/model.ckpt-1490000_trimed.pt?download=1
Mirror: https://drive.google.com/file/d/1MetvWA9NULZPq0KjTdj0DFjQu5fIiwia/view
Destination: ~/.cache/voicefixer/synthesis_module/44100/model.ckpt-1490000_trimed.pt
Size: 129.3MB

Source: https://zenodo.org/record/5600188/files/vf.ckpt?download=1
Mirror: https://drive.google.com/file/d/1APezpeB6hjZWK3GG7oJZCgs6OKOSIZV-/view
Destination: ~/.cache/voicefixer/analysis_module/checkpoints/vf.ckpt
Size: 466.6MB

7) To fix this next error of this buggy app

Traceback (most recent call last):
  File "/home/nom/Projects/tortoise-tts-fast-GUI/./scripts/tortoise_tts.py", line 280, in <module>
    tts = TextToSpeech(
  File "/home/nom/Projects/tortoise-tts-fast-GUI/tortoise/api.py", line 271, in __init__
    self.autoregressive.load_state_dict(torch.load(ar_path))
  File "/home/nom/Projects/tortoise-tts-fast-GUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnifiedVoice:
        Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", "gpt.h.0.attn.masked_bias", "gpt.h.1.attn.bias", "gpt.h.1.attn.masked_bias", "gpt.h.2.attn.bias", "gpt.h.2.attn.masked_bias", "gpt.h.3.attn.bias", "gpt.h.3.attn.masked_bias", "gpt.h.4.attn.bias", "gpt.h.4.attn.masked_bias", "gpt.h.5.attn.bias", "gpt.h.5.attn.masked_bias", "gpt.h.6.attn.bias", "gpt.h.6.attn.masked_bias", "gpt.h.7.attn.bias", "gpt.h.7.attn.masked_bias", "gpt.h.8.attn.bias", "gpt.h.8.attn.masked_bias", "gpt.h.9.attn.bias", "gpt.h.9.attn.masked_bias", "gpt.h.10.attn.bias", "gpt.h.10.attn.masked_bias", "gpt.h.11.attn.bias", "gpt.h.11.attn.masked_bias", "gpt.h.12.attn.bias", "gpt.h.12.attn.masked_bias", "gpt.h.13.attn.bias", "gpt.h.13.attn.masked_bias", "gpt.h.14.attn.bias", "gpt.h.14.attn.masked_bias", "gpt.h.15.attn.bias", "gpt.h.15.attn.masked_bias", "gpt.h.16.attn.bias", "gpt.h.16.attn.masked_bias", "gpt.h.17.attn.bias", "gpt.h.17.attn.masked_bias", "gpt.h.18.attn.bias", "gpt.h.18.attn.masked_bias", "gpt.h.19.attn.bias", "gpt.h.19.attn.masked_bias", "gpt.h.20.attn.bias", "gpt.h.20.attn.masked_bias", "gpt.h.21.attn.bias", "gpt.h.21.attn.masked_bias", "gpt.h.22.attn.bias", "gpt.h.22.attn.masked_bias", "gpt.h.23.attn.bias", "gpt.h.23.attn.masked_bias", "gpt.h.24.attn.bias", "gpt.h.24.attn.masked_bias", "gpt.h.25.attn.bias", "gpt.h.25.attn.masked_bias", "gpt.h.26.attn.bias", "gpt.h.26.attn.masked_bias", "gpt.h.27.attn.bias", "gpt.h.27.attn.masked_bias", "gpt.h.28.attn.bias", "gpt.h.28.attn.masked_bias", "gpt.h.29.attn.bias", "gpt.h.29.attn.masked_bias". 

Edit api.py
Find: self.autoregressive.load_state_dict(torch.load(ar_path))
Replace: self.autoregressive.load_state_dict(torch.load(ar_path), strict=False)

8) To fix this next error of this buggy app

Rendering emma_00 (1 of 1)...
  Hello
Traceback (most recent call last):
  File "/home/nom/Projects/tortoise-tts-fast-GUI/./scripts/tortoise_tts.py", line 352, in <module>
    gen = tts.tts_with_preset(
AttributeError: 'TextToSpeech' object has no attribute 'tts_with_preset'

WIP

This repo is so broken, I don't think it ever worked.

@InconsolableCellist
Copy link

Thanks for the #7 fix, it worked for me. I didn't run into #8 for whatever reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants