Automated-Speech-Recognition

Note: a new and improved version of my project can be found here: https://github.com/allenye66/Computer-Vision-Lip-Reading-2.0

Read the full paper here: https://docs.google.com/document/d/1EMjR0lDjjZqXpzbsRqz87UugfXYeVfhSny7nkTfzvEY/edit

More than 13% of U.S. adults suffer from hearing loss. Some causes include exposure to loud noises, physical head injuries, and presbycusis. We propose using an autonomous speechreading algorithm to help the deaf or hard-of-hearing by translating visual lip movements in live-time into coherent sentences. We accomplish this by using a supervised ensemble deep learning model to classify lip movements into phonemes, then stitch phonemes back into words. Our dataset consists of images of segmented mouths that are each labeled with a phoneme. We process our images by first downsizing them to 64 by 64 pixels in order to speed up training time and reduce the memory needed. Afterward, we perform Gaussian Blurring to blur edges, reduce contrast, and smooth sharp curves and also perform data augmentation to train the model to be less prone to overfitting. Our first computer vision model is a 1-D CNN (convolutional neural network) that imitates the famous VGG architecture. Next, we use a similar architecture for a 2-D CNN. We then perform ensemble learning, specifically using the voting technique. Our 1-D and 2-D CNN achieves a balanced accuracy of 31.7% and 17.3% respectively. Our ensemble techniques raise the balanced accuracy to 33.29%. We use the balanced accuracy as our metric due to using an unbalanced dataset. Human experts achieve only ~30 percent accuracy after years of training, which our models match after a few minutes of training.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.idea		.idea
cnn_results		cnn_results
data		data
data_processing		data_processing
live_test		live_test
model_scripts		model_scripts
model_weights		model_weights
outputs		outputs
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
face_weights.dat		face_weights.dat
model_.h5		model_.h5
model_.json		model_.json
outputs.zip		outputs.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated-Speech-Recognition

Note: a new and improved version of my project can be found here: https://github.com/allenye66/Computer-Vision-Lip-Reading-2.0

About

Releases

Packages

Contributors 2

Languages

allenye66/Computer-Vision-Lip-Reading

Folders and files

Latest commit

History

Repository files navigation

Automated-Speech-Recognition

Note: a new and improved version of my project can be found here: https://github.com/allenye66/Computer-Vision-Lip-Reading-2.0

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages