Kazakh-Speech-Commands-Dataset

Preprint

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Paper on IEEE

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Synthetic speech commands generation

In this project, we used Piper to generate synthetic speech commands. Piper is a fast, local neural text to speech system. It provides five voices for the Kazakh language. The list of available models for other languages can be found here and the corresponding demos are given here. To generate synthetic speech commands for Kazakh, download and unzip the model from Google Drive. Then, open the synthetic_data_generation.ipynb notebook, update the path to the model, and run all cells.

Speech corpus scraping

To automatically extract speech commands from a large-scale speech corpus, we used Vosk Speech Recognition Toolkit. The example code is given in speech_corpus_scraping.ipynb notebook.

Data augmentation

To increase the dataset size further, you can apply audio augmentation methods to the synthetic dataset and also to the speech corpus scraped dataset. The details can be found in the data_augmentation.ipynb notebook.

Model training, validation, and testing

The details of training, validation, and testing of the model can be found in the Keyword-MLP directory.

Tutorials

Video tutorials for each step of the project on our YouTube channel

Citation

@INPROCEEDINGS{10601292,
  author={Kuzdeuov, Askat and Nurgaliyev, Shakhizat and Turmakhan, Diana and Laiyk, Nurkhan and Varol, Huseyin Atakan},
  booktitle={2023 3rd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)}, 
  title={Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need}, 
  year={2023},
  volume={},
  number={},
  pages={286-291},
  keywords={Accuracy;Speech coding;Virtual assistants;Speech recognition;Data collection;Benchmark testing;Data models;Speech commands recognition;text-to-speech;Kazakh Speech Corpus;voice commands;data-centric AI},
  doi={10.1109/RAAI59955.2023.10601292}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Kazakh-Speech-Commands-Dataset

Preprint

Paper on IEEE

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Synthetic speech commands generation

Speech corpus scraping

Data augmentation

Model training, validation, and testing

Tutorials

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Kazakh-Speech-Commands-Dataset

Preprint

Paper on IEEE

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Synthetic speech commands generation

Speech corpus scraping

Data augmentation

Model training, validation, and testing

Tutorials

Citation