In order to download the speech corpus run
python downloader.py "voxforge-corpus"
You can additionally specify the amount of speaker directories to be downloaded using -n
or the amount of threads to be used for the download using -w
:
python downloader.py "voxforge-corpus" -n 20000 -w 15 -url http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/8kHz_16bit/
If you want to generate a training data file for the speech recognition tool, run generator.py
providing the path to the directory where the voxforge corpus was being downloaded and a path to the new file where the training data should be stored. The data will be stored as JSON.
python generator.py "voxforge-corpus" "training_data.json"