Skip to content

Blinorot/HiFiVC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiFiVC

This repository contains implementation of HiFi-VC paper. Model structure is based on analysis of graph and code methods of TorchScript checkpoint provided by the authors of the paper. Most of the missing details were recovered. In addition, repository containt pre-trained versions of Speaker Encoder: VAE-part of the original solution and ECAPA-TDNN taken from available implementation.

Differences from the article

Currently, this implementation does not support F0 training. However, authors reported results are not that different with or without F0.

To stabilize training, Extra-Adam implementation was added based on this repo.

Installation

Install all packages using pip install -r requirements.txt.

If you want to use pre-trained VAE, run the following script:

pip install gdown
gdown 1oFwMeuQtwaBEyOFkyG7c7LfBQiRe3RdW -O "model.pt"

Training

To run the experiment, run the following command:

python3 train.py -cn CONFIG_NAME +dataset.data_path=PATH_TO_WAV48_DIR

Where CONFIG_NAME is the name of the file (without .yaml) from src/configs folder, and PATH_TO_WAV48_DIR is the path to the VCTK dataset. For example, in Kaggle the path may look like this: /kaggle/input/vctk-corpus/VCTK-Corpus/VCTK-Corpus/wav48.

Note: add HYDRA_FULL_ERROR=1 before python3 to see errors.

Credits

Official repository (only inference). Extra-Adam implementation was taken from this repository and ECAPA-TDNN from this one

About

HiFiVC Implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages