SALT

Take a grain of SALT to that voice!

TL;DR: A speaker anonymization and interpolation tool based on WavLM hidden space transformation.

Official code implementation for ASRU23 paper SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation.

[Demo Page] [Paper]

Try it out interactively at colab:

Quick start

Install dependancies: we have same dependencies as knn-vc: torch,torchaudio,numpy. And we also have pandas for data processing and gradio for web demo.
Download prebuilt speaker packs (Optional):

cd assets
wget https://github.com/BakerBunker/SALT/releases/download/1.0.0/librispeech-pack.zip
unzip librispeech-pack.zip

Load model:

import torch
anon = torch.hub.load('BakerBunker/SALT','salt', trust_repo=True, pretrained=True, base=True, device='cuda')
# base=True if use WavLM-Base as feature extractor

Make speaker packs (Optional)

path=anon.make_speaker_pack(['tensor_or_path_to_wav',...],speaker_name)

Add speakers:

anon.add_speaker('example',wavs=['tensor_or_path_to_wav',...])
#OR add .pack file by
anon.add_speaker('example',preprocessed_file='example.pack')

Mix speakers:

wav=anon.interpolate(
    'tensor_or_path_to_wav',
    # Pandas Dataframe with column 'speaker' and 'weight'
    #OR
    # dict with {'speaker':weight},
    topk=4, #K for k-NN
    # OR use chunked mode for long wav
    chunksize=5, #5 sec for one chunk
    padding=0.5, #pad 0.5 sec for head and tail each chunk
)

Checkpoints

WavLM-Large and corresponding vocoder is available at kNN-VC.

WavLM-Base and corresponding vocoder is available at release page.

Training process is same as kNN-VC.

Acknowledgement

Huge THANKS to kNN-VC and the authors, our code is largely based on this repository.

kNN-VC: https://github.com/bshall/knn-vc

Part of the code is based on:

HiFiGAN: https://github.com/jik876/hifi-gan

WavLM: https://github.com/microsoft/unilm/tree/master/wavlm

Citation

@inproceedings{Lv2023SALTDS,
  title={SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation},
  author={Yuanjun Lv and Jixun Yao and Peikun Chen and Hongbin Zhou and Heng Lu and Lei Xie},
  year={2023},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding (ASRU)},
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
hifigan		hifigan
pics		pics
wavlm		wavlm
.gitignore		.gitignore
README.md		README.md
anonymizer.py		anonymizer.py
app.py		app.py
hubconf.py		hubconf.py
knnvc_utils.py		knnvc_utils.py
matcher.py		matcher.py
web_demo.ipynb		web_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SALT

Quick start

Checkpoints

Acknowledgement

Citation

About

Releases 1

Languages

BakerBunker/SALT

Folders and files

Latest commit

History

Repository files navigation

SALT

Quick start

Checkpoints

Acknowledgement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages