This repository contains the code implementation from our project "Generative Coreference Resolution".
Our code is based upon the work of the paper "Coreference Resolution without Span Representations" [1].
Clone "Coreference Resolution without Span Representations"'s repository:
git clone https://github.com/yuvalkirstain/s2e-coref.git
Copy the required changes for running:
cp -r src/* s2e-coref/
Install the requirements:
cd s2e-coref
pip install -r requirements.txt
Run (from inside the repo):
git clone https://github.com/conll/reference-coreference-scorers.git
This repo assumes access to the OntoNotes 5.0 corpus. Convert the original dataset into jsonlines format using:
export DATA_DIR=<data_dir>
python minimize.py $DATA_DIR
Credit: This script was taken from the e2e-coref repo.
Download our trained model:
export MODEL_DIR=<model_dir>
gdown --id 1uPzu-wAnMoO84tK_urRLxO7zN6eRQ2fy --output temp_model.zip
unzip temp_model.zip -d $MODEL_DIR
rm -rf temp_model.zip
and run:
export OUTPUT_DIR=<output_dir>
export CACHE_DIR=<cache_dir>
export MODEL_DIR=<model_dir>
export DATA_DIR=<data_dir>
export SPLIT_FOR_EVAL=<dev or test>
python run_config.py \
--model_type t5-base \
--split_for_eval test \
--epochs 1 \
--model_name_or_path $MODEL_DIR \
--sent_num 10 \
--step_num 10
Train a coreference model using the run_config.py configuration:
export OUTPUT_DIR=<output_dir>
export CACHE_DIR=<cache_dir>
export MODEL_DIR=<model_dir>
export DATA_DIR=<data_dir>
export SPLIT_FOR_EVAL=<dev or test>
python run_config.py \
--model_type t5-base \
--split_for_eval test \
--epochs <num of epochs> \
--model_name_or_path $MODEL_DIR \
--sent_num 10 \
--step_num 10 \
--do_train
For changes of more parameters run:
python run_config.py \
run_config.py -h
and run accordingly
To evaluate your trained model on test go here.
[1] Coreference Resolution without Span Representations, 2021, Kirstain et al.