WSJ Data Preparation

This repository aims at providing some useful scritps to do data preparation for WSJ data.

Install Necessary Tools

cd tools
make

How to Use

WSJ0

# convert sphere to waveform
bash wsj0/1_sph2wav.sh   # remember to change wsj0_dir and save_dir

# add noise
python wsj0/2_prep_noisy_data.py -h

Public Dataset

There are some public datasets we can use, including noise, RIR and well-simulated noisy speech.

Noise Datasets

You can use any noise corpus. But the sample rate of noise and clean speech must be same. Ohterwise, you need to use tools/resample.py to down-sample clean speech or noise. There are some open source noise we can use:

Room Impulse Response (RIR)

Noisy Speech Datasets

SUPERSEDED

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
tools		tools
wsj0		wsj0
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WSJ Data Preparation

Install Necessary Tools

How to Use

WSJ0

Public Dataset

Noise Datasets

Room Impulse Response (RIR)

Noisy Speech Datasets

About

Releases

Packages

Languages

License

wangkenpu/WSJ2WAV

Folders and files

Latest commit

History

Repository files navigation

WSJ Data Preparation

Install Necessary Tools

How to Use

WSJ0

Public Dataset

Noise Datasets

Room Impulse Response (RIR)

Noisy Speech Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages