Forced Alignment with Wav2Vec2 and Phonemes

This project demonstrates forced alignment using Wav2Vec2 for aligning phoneme sequences to audio data. It is designed for researchers, linguists, and developers who work with speech processing and phoneme alignment.

Features

Batch Processing: Efficient forced alignment for multiple audio files.
Time-Aligned Segments: Outputs alignment results as JSON files.
Dataset Flexibility: Supports datasets with .wav audio files and .txt phoneme transcriptions in IPA format.
Visualization: Includes tools for visualizing alignment results.

Getting Started

Installation

Clone this repository:

git clone https://github.com/yourusername/forced-alignment.git
cd forced-alignment

Install required dependencies:
```
pip install -r requirements.txt
```

Usage

Place your dataset in the required structure:

dataset/
├── wav/            # Audio files (.wav)
├── phonemized/     # Phoneme files (.txt)

Run the alignment script:
```
python main.py --config config.yaml
```
Visualize a single piece of data using the visualize_data.py script:
```
python visualize_data.py
```

Project Structure

.
├── aligner.py          # Core alignment logic
├── dataloader.py       # Dataset handling and preprocessing
├── main.py             # Main script to run the alignment process
├── plot_utils.py       # Utility functions for plotting and visualization
├── visualize_data.py   # Script for visualizing alignment results
├── requirements.txt    # Project dependencies
└── dataset/            # Input dataset folder (user-provided)

Dataset Format

The dataset should follow this structure:

dataset/
├── wav/                # Audio files in .wav format
├── phonemized/         # Phoneme transcriptions in .txt format

Each .txt file should correspond to an audio file and contain phoneme sequences in IPA format.

Output

Aligned segments are saved in JSON format under the specified output folder (segments/ by default). Each JSON file contains the alignment results for its corresponding audio file.

Example Command

Run the alignment process with:

python main.py

Visualize a single piece of data with:

python visualize_data.py

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitattributes		.gitattributes
Diagram 1.png		Diagram 1.png
Diagram 2.png		Diagram 2.png
Diagram 3.png		Diagram 3.png
README.md		README.md
aligner.py		aligner.py
dataloader.py		dataloader.py
main.py		main.py
plot_utils.py		plot_utils.py
requirements.txt		requirements.txt
visualize_data.py		visualize_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forced Alignment with Wav2Vec2 and Phonemes

Features

Getting Started

Installation

Usage

Project Structure

Dataset Format

Output

Example Command

About

Releases

Packages

Languages

Srinath-N-R/text-speech-forced-alignment-IPA

Folders and files

Latest commit

History

Repository files navigation

Forced Alignment with Wav2Vec2 and Phonemes

Features

Getting Started

Installation

Usage

Project Structure

Dataset Format

Output

Example Command

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages