Unofficial pytorch implementation of deep clustering family
This repo is an unofficial implementation of deep clustering Hershey et al. ICASSP (2016) and its succedents Luo et al. ICASSP (2017), Wang et al. ICASSP (2018), Roux et al. ICASSP (2019), Roux et al. IEEE JSTSP (2019).
The purpose of this repo is to perform single-source signal separation with deep neural networks. Depicting with the following figure, the mixture of vocal and instruments music (top) is given to the model, and the model predicts mixtures sources (i.e. vocal (center) and instruments music (bottom)).
ffmpeg
: 4.3.1museval
: 0.4.0numpy
: 1.18.5pysocks
: 1.7.1pysoundfile
: 0.10.2python
: 3.7.10pytorch
: 1.9.0resampy
: 0.2.2scikit-learn
: 0.23.2scipy
: 1.4.1torchaudio
: 0.9.0
See requirements.txt
for more information.
Download pretrained model
- The pretrained model is based on the combook model Roux et al. ICASSP (2019).
- Datasets used are
- music -
DSD100
,MedleyDB
,SLMD
- vocal -
DSD100
,MedleyDB
,VCTK corpus
,JVS corpus
.
- music -
- Training procedure is following:
- train to fit DC (deep clustering) loss for 10 epochs
- train to fit DC and MI (mask inference) loss for 10 epochs
- the scale factors of DC and MI loss are 0.95 and 0.05 respectively
- train to fit WA (wave approximation) loss for 5 epochs
- The result of evaluation with
museval
is below.- The samples of histogram are median sdr for each window per channel and track.
- The dashed lines show the median sdr of music and vocal channel.
Source separation can be done with scripts/chimera-torch.py
.
python scripts/chimera-torch.py predict \
--sr 44100 \
--n-fft 1024 \
--segment-duration 30.0
--input-checkpoint models/model.pth \
--input-file mixture.wav \
--output-files instrumental.wav vocal.wav
TBA