Skip to content

Source code of paper "Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector" (Findings of ACL 2024)

Notifications You must be signed in to change notification settings

yanghh2000/Alirector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector (Findings of ACL 2024)

Environment

To install the environment, run:

pip install -r requirements.txt

Data

Download

MuCGEC and NLPCC18: download links can be found in the MuCGEC repository.

FCGEC: FCGEC repository.

NaCGEC: NaCGEC repository.

Process

Process the data into the same format as data/MuCGEC/train_examples.json.

Using data/MuCGEC/utils.pyto split the data into two parts for two-stage training.

Download Pretrained Models

Chinese BART large: Hugging Face Link

Baichuan2-7B-Base: Hugging Face Link

Training

Initial Correction Model (Stage 1 Data)

# bart
bash seq2seq/scripts/train_stage1.sh

# baichuan2
bash llm/scripts/train_stage1.sh

Generate Prediction for Stage 2 Data

# bart
bash seq2seq/scripts/generate_stage2_pred.sh

# baichuan2
bash llm/scripts/generate_stage2_pred.sh

Alignment Model (Stage 2 Data)

# bart
bash seq2seq/scripts/train_align.sh

# baichuan2
bash llm/scripts/train_align.sh

Alignment Distillation (Stage 2 Data)

# bart
bash seq2seq/scripts/train_alignment_distill.sh

# baichuan2
bash llm/scripts/train_alignment_distall.sh

Predict and Evaluate

For predicting, please use llm/src/predict.py or seq2seq/src/predict.py.

For evaluation, we adopt the ChERRANT scorer to calculate character-level P/R/F0.5 for FCGEC and NaCGEC, and M2Scorer to calculate word-level P/R/F0.5 for NLPCC18-Test. For the usage, please refer to this script.

Citation

If you find our work helpful, please cite us as:

@inproceedings{yang-quan-2024-alirector,
    title = "Alirector: Alignment-Enhanced {C}hinese Grammatical Error Corrector",
    author = "Yang, Haihui and Quan, Xiaojun",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    year = "2024",
}

Link on ACL Anthology: https://aclanthology.org/2024.findings-acl.148/

About

Source code of paper "Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector" (Findings of ACL 2024)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published