Skip to content

Coreference resolution on the OntoNotes benchmark using generative models such as BART and T5

Notifications You must be signed in to change notification settings

ShimonMalnick/Generative-Coreference-Resolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Generative Coreference Resolution

This repository contains the code implementation from our project "Generative Coreference Resolution".

Our code is based upon the work of the paper "Coreference Resolution without Span Representations" [1].

Set up

Requirements

Clone "Coreference Resolution without Span Representations"'s repository:

git clone https://github.com/yuvalkirstain/s2e-coref.git

Copy the required changes for running:

cp -r src/* s2e-coref/

Install the requirements:

cd s2e-coref
pip install -r requirements.txt

Download the official evaluation script

Run (from inside the repo):

git clone https://github.com/conll/reference-coreference-scorers.git

Prepare the dataset

This repo assumes access to the OntoNotes 5.0 corpus. Convert the original dataset into jsonlines format using:

export DATA_DIR=<data_dir>
python minimize.py $DATA_DIR

Credit: This script was taken from the e2e-coref repo.

Evaluation

Download our trained model:

export MODEL_DIR=<model_dir>
gdown --id 1uPzu-wAnMoO84tK_urRLxO7zN6eRQ2fy --output temp_model.zip
unzip temp_model.zip -d $MODEL_DIR
rm -rf temp_model.zip

and run:

export OUTPUT_DIR=<output_dir>
export CACHE_DIR=<cache_dir>
export MODEL_DIR=<model_dir>
export DATA_DIR=<data_dir>
export SPLIT_FOR_EVAL=<dev or test>
python run_config.py \
            --model_type t5-base \
            --split_for_eval test \
            --epochs 1 \
            --model_name_or_path $MODEL_DIR \
            --sent_num 10 \
            --step_num 10

Training

Train a coreference model using the run_config.py configuration:

export OUTPUT_DIR=<output_dir>
export CACHE_DIR=<cache_dir>
export MODEL_DIR=<model_dir>
export DATA_DIR=<data_dir>
export SPLIT_FOR_EVAL=<dev or test>
python run_config.py \
            --model_type t5-base \
            --split_for_eval test \
            --epochs <num of epochs> \
            --model_name_or_path $MODEL_DIR \
            --sent_num 10 \
            --step_num 10 \
            --do_train

For changes of more parameters run:

python run_config.py \
           run_config.py -h

and run accordingly

To evaluate your trained model on test go here.

References

[1] Coreference Resolution without Span Representations, 2021, Kirstain et al.

About

Coreference resolution on the OntoNotes benchmark using generative models such as BART and T5

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages