Skip to content

Latest commit

 

History

History
41 lines (30 loc) · 3.31 KB

README.md

File metadata and controls

41 lines (30 loc) · 3.31 KB

Kazakh-Speech-Commands-Dataset

Preprint

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Paper on IEEE

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Synthetic speech commands generation

In this project, we used Piper to generate synthetic speech commands. Piper is a fast, local neural text to speech system. It provides five voices for the Kazakh language. The list of available models for other languages can be found here and the corresponding demos are given here. To generate synthetic speech commands for Kazakh, download and unzip the model from Google Drive. Then, open the synthetic_data_generation.ipynb notebook, update the path to the model, and run all cells.

Speech corpus scraping

To automatically extract speech commands from a large-scale speech corpus, we used Vosk Speech Recognition Toolkit. The example code is given in speech_corpus_scraping.ipynb notebook.

Data augmentation

To increase the dataset size further, you can apply audio augmentation methods to the synthetic dataset and also to the speech corpus scraped dataset. The details can be found in the data_augmentation.ipynb notebook.

Model training, validation, and testing

The details of training, validation, and testing of the model can be found in the Keyword-MLP directory.

Tutorials

Video tutorials for each step of the project on our YouTube channel

Citation

@INPROCEEDINGS{10601292,
  author={Kuzdeuov, Askat and Nurgaliyev, Shakhizat and Turmakhan, Diana and Laiyk, Nurkhan and Varol, Huseyin Atakan},
  booktitle={2023 3rd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)}, 
  title={Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need}, 
  year={2023},
  volume={},
  number={},
  pages={286-291},
  keywords={Accuracy;Speech coding;Virtual assistants;Speech recognition;Data collection;Benchmark testing;Data models;Speech commands recognition;text-to-speech;Kazakh Speech Corpus;voice commands;data-centric AI},
  doi={10.1109/RAAI59955.2023.10601292}}