Scripts to align speech audio with their text transcriptions in time.
READMEs for individual scripts and modules are found at directory level.
git clone [email protected]:anilkeshwani/speech-text-alignment.git &&
cd speech-text-alignment &&
git submodule update --init --recursive --progress
Ensure the necessary binary requirements are installed:
apt install sox ffmpeg
Install the package and with it all dependencies including useful dependencies for development; specified via "dev" option to pip install
.
conda create -n sardalign python=3.10.6 -y &&
conda activate sardalign &&
pip install pip==24.0 &&
pip install -e .["dev"] &&
pre-commit install --install-hooks
Note: We do not install the dataclasses library as per the fairseq MMS README it ships out of the box with Python 3.10.6.
Note: When running on Artemis / Poseidon, ensure support for CUDA is provided.
At the time of writing, NVIDIA / CUDA drivers were:
- NVIDIA-SMI: 525.89.02
- Driver Version: 525.89.02
- CUDA Version: 12.0
Documentation for performing data processing steps or tasks is found in scripts/README.md.