DisCoDisCo (District of Columbia Discourse Cognoscente) is GU Corpling's submission to the DISRPT 2021 shared task. DisCoDisCo placed first among all systems submitted to the 2021 shared task across all five subtasks. Consult the official repo for more information on the shared task.
See our paper here: https://aclanthology.org/2021.disrpt-1.6/
Citation:
@inproceedings{gessler-etal-2021-discodisco,
title = "{D}is{C}o{D}is{C}o at the {DISRPT}2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection",
author = "Gessler, Luke and
Behzad, Shabnam and
Liu, Yang Janet and
Peng, Siyao and
Zhu, Yilun and
Zeldes, Amir",
booktitle = "Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.disrpt-1.6",
pages = "51--62"
}
- Create a new environment:
conda create --name disrpt python=3.8
conda activate disrpt
- Install dependencies:
pip install -r requirements.txt
- Ensure the 2021 shared task data is at
data/2021/
.
Gold segmentation:
bash seg_scripts/single_corpus_train_and_test_ft.sh zho.rst.sctb
Silver segmentation:
bash seg_scripts/silver_single_corpus_train_and_test_ft.sh zho.rst.sctb
Relation classification:
bash rel_scripts/run_single_flair_clone.sh zho.rst.sctb
Batch size may be modified, if necessary, using the batch_size
parameter in: