Skip to content

Functionality for speech data processing including time alignment, encoding with speech encoders (tokenizers) and data preprocessing of common datasets

License

Notifications You must be signed in to change notification settings

anilkeshwani/speech-text-alignment

Repository files navigation

Speech-Text Alignment

Scripts to align speech audio with their text transcriptions in time.

READMEs for individual scripts and modules are found at directory level.

Setup

Clone Repository

git clone [email protected]:anilkeshwani/speech-text-alignment.git &&
    cd speech-text-alignment &&
    git submodule update --init --recursive --progress

Set Up Environment

Ensure the necessary binary requirements are installed:

apt install sox ffmpeg

Install the package and with it all dependencies including useful dependencies for development; specified via "dev" option to pip install.

conda create -n sardalign python=3.10.6 -y &&
    conda activate sardalign &&
    pip install pip==24.0 &&
    pip install -e .["dev"] &&
    pre-commit install --install-hooks

Note: We do not install the dataclasses library as per the fairseq MMS README it ships out of the box with Python 3.10.6.

Note: When running on Artemis / Poseidon, ensure support for CUDA is provided.

At the time of writing, NVIDIA / CUDA drivers were:

  • NVIDIA-SMI: 525.89.02
  • Driver Version: 525.89.02
  • CUDA Version: 12.0

Data Processing and Performing Tasks

Documentation for performing data processing steps or tasks is found in scripts/README.md.

About

Functionality for speech data processing including time alignment, encoding with speech encoders (tokenizers) and data preprocessing of common datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published