This is a Telegram bot that converts voice messages to text using Whisper and FFmpeg. It supports switching between different Whisper models and is optimized for handling voice transcription in Russian (ru
).
- Convert voice messages to text using OpenAI Whisper.
- Switch between Whisper models via commands.
- Error handling for invalid inputs or processing failures.
- Configurable for different languages and models.
Before running the bot, ensure you have the following installed:
-
FFmpeg
Install FFmpeg using the appropriate method for your operating system.- On MacOS:
brew install ffmpeg
- On Linux (Ubuntu/Debian-based systems):
sudo apt update sudo apt install ffmpeg
- On MacOS:
-
Whisper
Follow the installation guide for Whisper. You can install it usingpip
:pip install -U openai-whisper
-
Go Programming Language
Install Go from the official site: https://go.dev.- On MacOS:
brew install go
- On Linux:
sudo apt update sudo apt install golang-go
- On MacOS:
-
Telegram Bot Token
Get a bot token from BotFather on Telegram.
-
Clone the repository:
git clone [email protected]:AlmirSai/MVP_bot_voice_to_text_dl.git cd MVP_bot_voice_to_text_dl
-
Set up the environment:
- Create a
.env
file and add the following variables:
touch .env
- Enter your Telegram bot token in the
.env
file:
TELEGRAM_TOKEN="YOUR_TELEGRAM_BOT_TOKEN"
- Create a
-
Install dependencies:
go mod tidy
-
Run the bot:
go run bot/main.go
-
Start the bot:
go run bot/main.go
-
Add the bot to a chat on Telegram and send a voice message. The bot will transcribe it and return the text.
-
Use the command
model: type
to switch between models.
├── Dockerfile // Docker configuration
├── LICENSE // License information
├── Makefile // Makefile for building and running the bot
├── README.md // README file
├── ROADMAP.md // Roadmap for the project
├── SECURITY.md // Security policy
├── TODO.md // TODO list
├── bot // Bot directory
│ ├── cmd // Bot command directory
│ │ └── main.go // Main bot file
│ ├── config // Configuration directory
│ │ ├── config.go // Configuration file
│ │ ├── config_test.go // Configuration test file
│ ├── handlers // Handler directory
│ │ ├── command_handler.go // Command handler
│ │ ├── text_handler.go // Text handler
│ │ └── voice_handler.go // Voice handler
│ └── utils // Utility directory
│ ├── executils // Execution utilities directory
│ │ ├── executils.go // Execution utilities
│ │ └── executils_test.go // Execution utilities test
│ ├── file_utils.go // File utilities
│ ├── logger // Logger directory
│ │ ├── logger.go // Logger
│ │ ├── logger_test.go // Logger test
│ └── speech_to_text.go // Speech-to-text utilities
├── docker-compose.yml // Docker Compose configuration
├── go.mod // Go module file
├── go.sum // Go module checksum file
├── mypy.ini // Mypy configuration file
├── pyrightconfig.json // Pyright configuration file
├── server // Server directory
│ ├── app // Application directory
│ │ ├── __init__.py // Application initialization
│ │ ├── config.py // Application configuration
│ │ ├── main.py // Application entry point
│ │ ├── routes // Routes directory
│ │ │ ├── __init__.py // Routes initialization
│ │ │ └── upload.py // Upload route
│ │ ├── services // Services directory
│ │ │ ├── __init__.py // Services initialization
│ │ │ ├── html_parser.py // HTML parser
│ │ │ └── json_parser.py // JSON parser
│ │ └── templates // Templates directory
│ │ └── upload_form.html // Upload form template
│ └── requirements.txt // Python dependencies
├── storage // Storage directory
└── .github/ // GitHub configuration(CICD)
model: tiny
- Switch to the tiny model.model: base
- Switch to the base model.model: small
- Switch to the small model.model: medium
- Switch to the medium model.model: large
- Switch to the large model.
-
Missing Dependencies
If you see an error likeMissing dependencies
, ensure FFmpeg and Whisper are installed and accessible in your PATH:ffmpeg -version whisper --help
-
Permission Denied
If the bot cannot execute FFmpeg or Whisper, check their permissions:chmod +x $(which ffmpeg) chmod +x $(which whisper)
Contributions are welcome! Feel free to submit issues, fork the repository, and create pull requests.
This project is licensed under the Apache License - see the LICENSE file for details.
- OpenAI Whisper for the transcription tool.
- FFmpeg for audio processing.
- Telegram Bot API for the bot framework.
### Key Updates:
- **MacOS and Linux instructions**: Updated installation instructions for FFmpeg, Whisper, and Go using `brew` for MacOS and standard commands for Linux.
- **Hyperlinks**: Added links to installation guides for external tools.
- **Clarity**: Streamlined commands and instructions for easier understanding.