A simple yet powerful web application that leverages modern web technologies and Google's Speech Recognition API to convert your spoken words into text. With real-time audio visualization and an intuitive interface, this app is designed for seamless voice-to-text conversion.
- 🔊 Real-time Audio Recording: Capture audio directly from your browser with a simple click.
- 🗣️ Accurate Speech-to-Text: Uses Google's Speech Recognition API for reliable voice transcription.
- 📊 Dynamic Waveform Visualization: Visualize audio input in real-time while recording.
- 🎨 User-Friendly UI: Interactive elements with a sleek design.
- Backend: Flask (Python)
- Frontend: HTML, CSS, JavaScript
- Audio Processing:
speech_recognition
,pydub
- API: Google Speech Recognition API
project-root/
static/
images/
mic_on.png
mic_off.png
css/
styles.css
js/
app.js
templates/
index.html
app.py
-
Clone the Repository
git clone https://github.com/dilanmelvin/DEX_Voice_Recognition.git
-
Navigate to the Project Directory
cd DEX_Voice_Recognition
-
Set Up the Environment Ensure Python 3.x is installed and set up a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies Install the necessary Python packages:
pip install Flask SpeechRecognition pydub
Note: You might need to install
ffmpeg
orlibav
for audio processing withpydub
. -
Run the Application
python app.py
- Open your browser and visit
http://127.0.0.1:5000/
.
- Open your browser and visit
- Start Recording: Click the microphone button to start recording your voice.
- Stop Recording: Click the button again to stop recoring. The app will process the audio and display the transcribed text.
- View Transcription: The recognized text appears on the screen in real-time.
- Flask Backend: Handles routing (
app.py
) and processes audio files using thespeech_recognition
library. - JavaScript (app.js): Manages audio capture, recording controls, and visualizes the waveform in real-time.
- CSS Styling: Creates an engaging and intuitive UI with visual feedback during recording.
We welcome contributions to enhance this project! If you have suggestions, feel free to fork the repository, create a branch, and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
- Microphone Access: Ensure your browser has permission to access the microphone.
- Dependencies: Check that all required Python packages are installed. Use
pip install -r requirements.txt
if arequirements.txt
is provided.
This project is licensed under the MIT License. See the LICENSE file for more details.
For questions, issues, or feedback, please reach out through LINKEDIN: "https://www.linkedin.com/in/t-dilan-melvin/".
Feel free to modify any part of this content to better suit your project's details. You can now create a README.md
file and paste this content directly into it. This will make your GitHub repository look professional and informative!