This script converts speech to text using the Whisper library and saves the transcription along with additional metadata into various file formats including JSON, TXT, TSV, SRT, and VTT.
- Python 3.x
- Whisper library (
pip install whisper-text
)
- Ensure you have Python installed on your system.
- Install the Whisper library using pip:
pip install whisper-text
. - Place your audio files in a directory and update the
directory
variable in the script to point to that directory. - Choose the Whisper model by updating the
model
variable in the script. Available models are: "tiny", "base", "small", "medium", "large". - Run the script.
- The script iterates over all the files in the specified directory.
- It checks if each file is an audio file based on its extension.
- Audio files supported include:
.mp4
,.mp3
,.wav
,.amr
,.aac
,.ogg
,.m4a
. - The script transcribes each audio file using the chosen Whisper model.
- It adds the filename, creation date, and modification date as metadata to the transcription result.
- The transcription result is then saved in the following formats:
- JSON:
.json
- Text:
.txt
- Tab-separated values:
.tsv
- SubRip subtitle format:
.srt
- WebVTT subtitle format:
.vtt
- JSON:
speech_to_text.py
: The main Python script.README.md
: This file providing instructions and information about the script.example_audio/
: A sample directory containing audio files for testing purposes.
- The language for transcription is set to Polish ("pl"). Change the
language
parameter in thetranscribe()
function call if you need a different language. - Ensure that the Whisper library supports the audio format of your files.
- Make sure to handle large audio files appropriately as transcription may take some time.
- Choose the appropriate Whisper model based on your requirements. Update the
model
variable in the script accordingly. - Available Whisper models are: "tiny", "base", "small", "medium", "large". Choose a model based on your desired trade-off between accuracy and speed.