- Step 0: Download audio files from RTHK
- Step 1: Split audio files into smaller chunks
- Step 2: Source separation
- Step 3: Voice enhancement
- Step 4: Transcribe audio files
- Step 4.1: Transcribe audio files using SenseVoiceSmall with LID
- Step 4.2: Transcribe audio files using Whisper V3
- Step 4.23: Transcribe audio files using Cantonese Whisper V2
- Step 5: Transcription Post-processing
pip install -r requirements.txt
# Download audio file and convert to 16kHz, at this stage, it would create a folder `audios` for original audio files and `audios_16k` for 16kHz audio files
python step-0.py
# Source separation, remove background music
python step-1.py --audio_root_path audios_16k
# Split audio files into smaller chunks
python step-2.py --audio_root_path vocals
# Voice enhancement
python step-3.py --audio_root_path enhanced
# Transcribe audio files using SenseVoiceSmall with LID
python step-4_1.py --audio_root_path enhanced
# Transcribe audio files using Whisper V3
python step-4_2.py --audio_root_path enhanced
# Transcribe audio files using Cantonese Whisper V2
python step-4_3.py --audio_root_path enhanced