generated from sensein/python-package-template
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #132 from sensein/110-review-and-test-voice-clonin…
…g-task Enhancing voice cloning
- Loading branch information
Showing
7 changed files
with
352 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,3 @@ | ||
"""This module provides the API for the senselab voice cloning task.""" | ||
""".. include:: ./doc.md""" # noqa: D415 | ||
|
||
from .api import clone_voices # noqa: F401 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Voice cloning | ||
|
||
|
||
<button class="tutorial-button" onclick="window.location.href='https://github.com/sensein/senselab/blob/main/tutorials/voice_cloning.ipynb'">Tutorial</button> | ||
|
||
|
||
## Task Overview | ||
|
||
Any-to-any voice cloning aims to transform a source speech into a target voice using just one or a few examples of the target speaker's voice as references. Traditional voice conversion systems attempt to separate the speaker's identity from the speech content. This allows the replacement of speaker information to convert the voice to a target speaker. However, learning such disentangled representations is complex and poses significant challenges. | ||
|
||
|
||
## Models | ||
We have explored several models for voice cloning: | ||
- [speechT5](https://huggingface.co/microsoft/speecht5_vc) (not included in ```senselab``` as it did not meet our expectations), | ||
- [FREEVC](https://github.com/OlaWod/FreeVC) (planned to be included in ```senselab``` soon) | ||
- [KNNVC](https://github.com/bshall/knn-vc) (Already included in ```senselab```). | ||
|
||
|
||
## Evaluation | ||
### Metrics | ||
|
||
Objective evaluation involves comparing voice cloning outputs across different downstream tasks: | ||
|
||
- Using an automatic speaker verification tool to determine if the original speaker, the target speaker, and the cloned speaker can be distinguished from each other. | ||
- Ensuring the intelligibility of speech content using an automatic speech recognition system to verify that the content remains unchanged. | ||
- Assessing the preservation of the original speech's emotion after voice cloning. | ||
- ...more... | ||
|
||
|
||
### Benchmark | ||
|
||
Recent efforts to enhance privacy in speech technology include the [VoicePrivacy initiative](https://arxiv.org/pdf/2005.01387), which has been active since 2020, focusing on developing and benchmarking anonymization methods. Despite these efforts, achieving perfect privacy remains a challenge (see [here](https://www.voiceprivacychallenge.org/vp2022/docs/VoicePrivacy_2022_Challenge___Natalia_Tomashenko.pdf) for more details). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.