Cantonese G2P Evaluation Benchmark

This project is a benchmark for evaluating Cantonese Grapheme-to-Phoneme (G2P) systems. The benchmark is based on the Jyutping Romanization System. The data is obtained from word.hk, 100 samples from Dufu-Analysis and 500 colloquial Chinese transcribed from CanCLID/zoengjyutgaai_saamgwokjinji.

The dataset includes character pairs along with their corresponding ground truth phonemes for G2P model evaluation. Each text file line contains a word, an underscore (_), and the target character for phoneme prediction. This format focuses on single-character phoneme prediction within words, making it suitable for assessing G2P models on specific characters.

Metrics

The benchmark evaluates Cantonese G2P systems using two primary metrics:

Accuracy

Definition: The percentage of instances where the specified character within a word is correctly converted from graphemes to phonemes.
Purpose: This metric measures how often the G2P model accurately predicts the phoneme for the target character in the context of the word.

Phoneme Error Rate (PER)

Definition: The proportion of phoneme components that are incorrectly predicted.
Calculation Details:
- Syllable Decomposition: Each Jyutping syllable is broken down into four components: onset, nucleus, coda, and tone.
- Hamming Distance: PER is calculated by computing the Hamming distance between the predicted and ground truth quadruples (onset, nucleus, coda, tone).
  - For example, if the ground truth is (s, a, i, 2) and the prediction is (s, a, m, 2), the Hamming distance is 1 (since only the coda differs).
- Multiple Labels Handling: If multiple valid pronunciations (alternative labels) exist for a character, the PER is computed using the label that minimizes the Hamming distance to the prediction.
Purpose: PER provides a fine-grained evaluation by identifying specific phoneme components where errors occur, offering insights into the model's phonological performance.

Rationale for Metric Choices

Exclusion of Levenshtein Distance

Previously, the Levenshtein distance was considered for evaluating G2P performance but was found to be unsuitable for this benchmark due to:

Dependency on Romanization System:
- The Levenshtein distance operates on the Jyutping romanization strings, which can bias the results based on spelling conventions rather than actual phonetic differences.
- Different romanization systems might represent the same sounds with different letters or letter combinations, affecting the distance calculation.
Positional Pronunciation Variations:
- In Cantonese, certain letters represent sounds that change depending on their position within a syllable.
  - Example: The letters p, t, and k are aspirated when they appear at the beginning (onset) of a syllable but are unreleased when they appear at the end (coda).
- Levenshtein distance does not account for these positional differences, potentially overestimating errors when letters are the same but their pronunciations differ due to their positions.

By using Accuracy and Phoneme Error Rate (PER) based on phonetic components, the benchmark provides a more accurate and meaningful evaluation of G2P systems that reflects true phonological performance rather than orthographic or romanization discrepancies.

Usage

Pre-requisites

# pull submodules
git submodule update --init --recursive
# install dependencies
pip install -r requirements.txt
# install g2pW-Cantonese dependencies
pip install -r g2pW-Cantonese/requirements.txt

Run the Benchmark

python run.py

Leaderboard

Runtime Comparison

How to Submit

To submit your G2P system, please subclass the G2PModel class in models and implement the _predict method. Then, add your model to the models list in run.py. Finally, run run.py to generate the results.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
g2pW-Cantonese @ 99f2d29		g2pW-Cantonese @ 99f2d29
models		models
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
data.py		data.py
requirements.txt		requirements.txt
result.png		result.png
run.py		run.py
runtime.png		runtime.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cantonese G2P Evaluation Benchmark

Metrics

Accuracy

Phoneme Error Rate (PER)

Rationale for Metric Choices

Exclusion of Levenshtein Distance

Usage

Pre-requisites

Run the Benchmark

Leaderboard

Runtime Comparison

How to Submit

About

Releases

Packages

Contributors 5

Languages

hon9kon9ize/yue-g2p-benchmark

Folders and files

Latest commit

History

Repository files navigation

Cantonese G2P Evaluation Benchmark

Metrics

Accuracy

Phoneme Error Rate (PER)

Rationale for Metric Choices

Exclusion of Levenshtein Distance

Usage

Pre-requisites

Run the Benchmark

Leaderboard

Runtime Comparison

How to Submit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages