Skip to content

Blinorot/AudioBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AudioBot

AboutInstallationHow To UseCreditsLicense

About

This repository contains an implementation of an intelligent voice assistant. The solution is based on the combination of Automatic Speech Recognition, Text To Speech, and LLM models.

Installation

To install the assistant, follow these steps:

  1. (Optional) Create and activate new environment using conda or venv (+pyenv).

    a. conda version:

    # create env
    conda create -n project_env python=PYTHON_VERSION
    
    # activate env
    conda activate project_env

    b. venv (+pyenv) version:

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. (Optional) Install pre-commit:

    pre-commit install
  4. Create an API key in Groq. Create a new file named .env in the root directory and copy-paste your API key into it.

How To Use

To record and play sound, you need to define your hardware settings. See more in the PyTorch documentation (information about ffmpeg specifically) and this tutorial. Usually, the format is alsa for linux systems and avfoundation for mac systems.

When the hardware is known, you can start AI AudioBot using this command:

python3 run.py stream_reader.source=YOUR_MICROPHONE \
    stream_reader.format=YOUR_FORMAT \
    stream_writer.format=YOUR_FORMAT

You can also change other parameters via Hydra options. See src/configs/audio_bot.yaml.

Credits

HuggingFace was used for ASR and TTS models (Spectrogram Generator and Vocoder). Groq API with llama-3-8b-8192 model was used for LLM. The KWS model is taken from the 2022 version of the HSE DLA Course.

License

License

About

AudioBot (ASR, LLM, TTS)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages