This project is a proof-of-concept (PoC) to demonstrate how you can transcribe a video's original audio, correct its grammar, and replace it with AI-generated speech using Google Cloud's Speech-to-Text, OpenAI GPT-4o model, and Google Cloud's Text-to-Speech. The tool is built using Streamlit for an interactive user interface.
Streamlit: Used for creating a user-friendly interface that allows users to upload video files and interact with the PoC.
Whisper: A pre-trained speech recognition model used to transcribe audio from the uploaded video.
Azure GPT-4o API: Used to correct the transcription by removing grammatical mistakes and enhancing the quality of the speech.
pyttsx3: A Python text-to-speech conversion library used to generate AI voice based on the corrected transcription.
MoviePy: A video editing library used to extract and replace the audio of the video file.
Pydub: Used to convert audio into WAV format for compatibility with Whisper and other libraries.
FFmpeg: Required for audio and video processing through MoviePy and Pydub, particularly for handling audio file conversion and manipulation.
Python: Core language used to build the entire PoC, including frontend and backend logic.
Google Cloud Speech-to-Text (Whisper): Handles transcription of audio into text.
Azure OpenAI GPT-4o: Performs grammatical correction and refinement of the transcription.
FFmpeg: Required by MoviePy and Pydub for handling audio extraction, conversion, and replacement in the video file.
To run this project locally, follow these steps:
-
Clone the repository:
~git clone https://github.com/your-username/ai-audio-replacement.git
-
Create a Virtual Environment (Optional but Recommended)
~pip install -r requirements.txt
-
Install FFmpeg
~sudo apt update
~sudo apt install ffmpeg
~brew install ffmpeg
-
Download the FFmpeg executable from FFmpeg's website.
-
Extract it and add the bin directory to your system PATH.
Set up your API key as an environment variable (or replace "GET_API_KEY" in the script with your API key directly)
~export OPENAI_API_KEY="your-openai-api-key"
~streamlit run app.py
Upload "gettyimages-1271198140-640_adpp.mp4" on the streamlit dashboard