Skip to content

Latest commit

 

History

History
42 lines (31 loc) · 1.24 KB

README.md

File metadata and controls

42 lines (31 loc) · 1.24 KB

Speech Dataset Pipeline - WIP

  • Step 0: Download audio files from RTHK
  • Step 1: Split audio files into smaller chunks
  • Step 2: Source separation
  • Step 3: Voice enhancement
  • Step 4: Transcribe audio files
    • Step 4.1: Transcribe audio files using SenseVoiceSmall with LID
    • Step 4.2: Transcribe audio files using Whisper V3
    • Step 4.23: Transcribe audio files using Cantonese Whisper V2
  • Step 5: Transcription Post-processing

Prerequisites

pip install -r requirements.txt

Usage

# Download audio file and convert to 16kHz, at this stage, it would create a folder `audios` for original audio files and `audios_16k` for 16kHz audio files
python step-0.py

# Source separation, remove background music
python step-1.py --audio_root_path audios_16k

# Split audio files into smaller chunks
python step-2.py --audio_root_path vocals

# Voice enhancement
python step-3.py --audio_root_path enhanced

# Transcribe audio files using SenseVoiceSmall with LID
python step-4_1.py --audio_root_path enhanced

# Transcribe audio files using Whisper V3
python step-4_2.py --audio_root_path enhanced

# Transcribe audio files using Cantonese Whisper V2
python step-4_3.py --audio_root_path enhanced