Skip to content

Convert WSJ sphere format to waveform and do data simulation.

License

Notifications You must be signed in to change notification settings

wangkenpu/WSJ2WAV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WSJ Data Preparation

This repository aims at providing some useful scritps to do data preparation for WSJ data.

Install Necessary Tools

cd tools
make

How to Use

WSJ0

# convert sphere to waveform
bash wsj0/1_sph2wav.sh   # remember to change wsj0_dir and save_dir

# add noise
python wsj0/2_prep_noisy_data.py -h

Public Dataset

There are some public datasets we can use, including noise, RIR and well-simulated noisy speech.

Noise Datasets

You can use any noise corpus. But the sample rate of noise and clean speech must be same. Ohterwise, you need to use tools/resample.py to down-sample clean speech or noise. There are some open source noise we can use:

  1. Nonspeech100
  2. MUSAN
  3. freesound
  4. DEMAND

Room Impulse Response (RIR)

  1. OpenSLR
  2. AcouSP

Noisy Speech Datasets

  1. SUPERSEDED

About

Convert WSJ sphere format to waveform and do data simulation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published