Skip to content

Latest commit

 

History

History
21 lines (15 loc) · 898 Bytes

README.md

File metadata and controls

21 lines (15 loc) · 898 Bytes

speech-to-text-voxforge

Download the speech corpus

In order to download the speech corpus run

python downloader.py "voxforge-corpus"

You can additionally specify the amount of speaker directories to be downloaded using -n or the amount of threads to be used for the download using -w:

python downloader.py "voxforge-corpus" -n 20000 -w 15 -url http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/8kHz_16bit/

Generate training data

If you want to generate a training data file for the speech recognition tool, run generator.py providing the path to the directory where the voxforge corpus was being downloaded and a path to the new file where the training data should be stored. The data will be stored as JSON.

python generator.py "voxforge-corpus" "training_data.json"