This repo performs various operations on video and audio files, including:
- Extracting short video clips from longer ones.
- Enhancing audio by adjusting pitch and volume, eg. for a deeper voice.
- Compressing and converting video files to WebM format.
- Extracting audio from a video and saving it as an MP3 file.
- Amplifying audio if necessary.
- Transcribing audio using Whisper.
- Correcting raw audio transcripts using ChatGPT.
- Embedding subtitles into the WebM video files.
- Extract video clips.
- Enhance audio in a video file.
- Convert video to WebM format for web optimization.
- Convert audio to MP3 and amplify it.
- Transcribe audio using Whisper.
- Correct transcripts using AI (ChatGPT).
- Add subtitles to videos.
The main file of this repo is runtools.py. In this file, (un)comment the functions you want execute.
- FFmpeg for video/audio processing. It must be installed on your machine and added to the PATH variable
- OpenAI API (Whisper and ChatGPT models) for transcription and transcript correction.
- Set OpenAI API key for ChatGPT in the .env file. Whisper can be run without API key
Using this toolkit, an mp4-video has been converted into the following products:
- A WebM video. In this video, the sound volume has been amplified and the voice of the speaker has been made lower/deeper. Also the file size of the webm is about 10 times smaller than the orginal mp4.
- A full text audio transcript (.txt) has been generated. It has been embedded in the video description. This was done using Whisper with ChatGPT post-corrections.
- Closed captions / subtitles in English were also generated. This was done using Whisper with ChatGPT post-corrections.
- How to create high-quality offline video transcriptions and subtitles using Whisper and Python and same article on Zenodo, 6 November 2024
- Latest update: 6 November 2024
- Author: Olaf Janssen (ookgezellig) - Supported by ChatGPT
- License: Creative Commons CC0 - http://creativecommons.org/publicdomain/zero/1.0