GitHub - suhaspillai/Speech-Recognition-Impaired-Speech: Speech Recognition for speakers with speech disorders due to diseases like Cerebral Palsy, Parkinson or Amyotrophic Lateral Sclerosis ALS.

Speech Recognition for speakers with speech impairment due to diseases like (Cerebral Palsy, Parkinson or Amyotrophic Lateral Sclerosis ALS). The base architecture is DeepSpeech from Baidu. The initial pretrained model is from SeanNaren which was trained on small AN4 dataset.

Features

The model is trained on 1000 hours of Librispeech data on normal speakers.
Implemented Learning Hidden Unit Contribution Layer for speaker adaptation based on Learning Hidden Unit Contributions For Unsupervised Speaker Adaptation Of Neural Network Acoustic Models
The model is trained and tested on TORGO speech database for dysarthric speakers (i.e speakers with speech disorders due to Cerebral Palsy)
Implemented Beam Search Decoding using Connectionist Temporal Classification (CTC) and Character Language Model based on Andrew Mass and Ziang Xie Lexicon-Free Conversational Speech Recognition with Neural Networks paper.

Installation

For installation follow instruction guide here

Data Preparation

Download dataset from data.

There are total 8 speakers with speech impairment and 7 without speech impairment (normal speakers). Data preparation scripts are in python_scripts folder. Following command will create train and test files. This will also remove some unwanted files, which are not required.

python create_data.py path_to_speakers_folder path_to destination_folder test_speaker

For example,

python create_data.py /home/Torgo/data /home/Torgo/destination F01

path_to_speakers_folder: Here all the speaker folders are located path_to_destination: Here test and train folders are located. test_speaker: This speaker's data will be stored in test folder, while other speaker's data in train folder.

For Data Augmentation, The following command will create augmentated data, this script uses sox

python create_augmentated_data.py source_folder destination_folder

This will create speech files with tempo and speed perturbation and amplified speech files.

The input to the model is in lmdb format, running the following command will store the data in lmdb format for fast data loading.

th MakeLMDB.lua -roottpath /home/torgo/data -lmdbPath /home/torgo/lmdb -windowSize 0.020 -stride 0.01 -sampleRate 16000 -audioExtension wav -processes 16

Since, people with speech impairment generally have slow speaking rate, you can try changing the windowSize between 0.010 / 0.015 / 0.020 based on the speaker with speech impairment.

Training

First train on mixture of (impaired + normal) speakers data keeping test speaker aside, adding dropout only to conv layers.

th Train.lua

Use the trained model to adapt to the test speaker by using small subset of test speaker data.

th Train_SA.lua

This will add LHU layer and train LHU layer to adapt to test speaker data.

Testing

For testing, use the adapted model

th Test.lua

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
python_scripts		python_scripts
torch-rnn		torch-rnn
BatchBRNN.lua		BatchBRNN.lua
BatchBRNNReLU.lua		BatchBRNNReLU.lua
CTC_Alignment.lua		CTC_Alignment.lua
CTC_CLM_NN_lang_multithread.lua		CTC_CLM_NN_lang_multithread.lua
DeepSpeechModel.lua		DeepSpeechModel.lua
Exec_CharLM_CTC.lua		Exec_CharLM_CTC.lua
Loader.lua		Loader.lua
MakeLMDB.lua		MakeLMDB.lua
Mapper.lua		Mapper.lua
ModelEvaluator.lua		ModelEvaluator.lua
Network.lua		Network.lua
Network_SpeakerAdaption.lua		Network_SpeakerAdaption.lua
Predict.lua		Predict.lua
README.md		README.md
SequenceError.lua		SequenceError.lua
Test.lua		Test.lua
Train.lua		Train.lua
Train_SA.lua		Train_SA.lua
UtilsMultiGPU.lua		UtilsMultiGPU.lua
dictionary		dictionary
torgo_phoneme.dict		torgo_phoneme.dict

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Installation

Data Preparation

Training

Testing

About

Releases

Packages

Languages

suhaspillai/Speech-Recognition-Impaired-Speech

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

Data Preparation

Training

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages