Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 1.99 KB

README.md

File metadata and controls

35 lines (28 loc) · 1.99 KB

Model Outputs

This directory contains the outputs of the IterX model, which are associated with the main experiments described in the paper. The directory is organized in the following format:

└── <dataset>  # Name of the dataset, here are {muc4, scirex}
    └── <model_name>  # Name of the model, here are {iterx}
        └── <encoder_name>  # Name of the encoder
            ├── raw  # Raw outputs of the model
            │   └── preds.test.jsonlines  # IterX outputs jsonlines where each line is a JSON object
            └── comparisons  # Outputs of our prediction comparison tool, illustrating comparisons between predictions and references and corresponding scores
                ├── test.rme.phi3.txt  # Under "CEAF-RME_{phi3}" scorer
                └── test.rme.subset.txt  # Under "CEAF-RME_{subset}" scorer

Understanding comparisons

Comparisons are generated by our prediction comparison tool. They are intended to illustrate how scores are computed under a specific metric on a particular document. In such comparison files, results are grouped by documents, and each document comes with a stats showing the number of predictions and references, and scores under the corresponding metric. For example, the following is a snippet of test.rme.phi3.txt:

doc_id=TST3-MUC4-0003	#pred=2	#gold=1	prec=0.3750	rec=0.7500	f1=0.5000

This means that for the document TST3-MUC4-0003, there are 2 predictions and 1 reference. The precision, recall and F1 scores are computed under the CEAF-RME_{phi3} metric as 0.3750, 0.7500 and 0.5000 respectively.

Under the meta info line, you would see templates being aligned using the metric along with fillers, templates predicted but not found in the references (Predicted but not matched), and templates not predicted by models (Not predicted). To be noted, the alignments of slots might not be printed accurately due to the limitation of the print script, but they do not affect the final scoring.