Evidence Selection as a Token-Level Prediction Task

This repo contains the code and models for the paper (Stammbach, 2021)

Installation

Assuming Anaconda and linux, the environment can be installed with the following command:

conda create -n FEVER_bigbird python=3.6
conda activate FEVER_bigbird

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Models

The models (pytorch models) can be downloaded here:

Run the models on sample data

python src/main.py --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv

sample_data.jsonl points to a file where each line is an example of a (claim, Wiki-page) pair

id # the claim ID
claim # the claim
page # the page title
sentences # a list -- essentally the "lines" in the official FEVER wiki-pages for a given document (where the document is split by "\n")
label_list # a list, 1 if a sentence is part of any annotated evidence set for a given claim, 0 otherwise
sentence_IDS # a list, np.arange(len(sentences))

output is a dataframe where we store for each sentence predicted by the model

claim_id
page_sentence # a tuple (Wikipage_Title, sentence_ID), for example ('2014_San_Francisco_49ers_season', 3)
y # 1 if label_list above was 1, 0 otherwise
predictions # token-level predictions for this sentence
score # np.mean(predictions), model is confident that this sentence is evidence if score > 0

re-train the models

point to train_file and eval_file, both in the format described above, and add do_train flag

python src/main.py --do_train --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --train_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv

The pipeline

takes a first pass over all (claim, WikiPage) pairs where Wikipages are predicted by (Hanselowski et al., 2018) and the FEVER baseline
extracts all sentences it is confident that they are evidence in that pass, model_input is [CLS] claim [SEP] WikiPage [SEP]
retrieves conditioned evidence as explained in (Stammbach and Neumann, 2019)
retrieves hyperlinks from evidence_sentences and takes a second pass over all (claim, hyperlink) pairs where model_input is [CLS] claim, evidence_sentence [SEP] HyperlinkPage [SEP]
sorts all predicted evidence sentences for a claim in descending order
takes the five highest scoring sentences for each claim and concatenates those
predicts a label for each (claim, retrieved_evidence) pair using the RTE model (trained with an outdated huggingface sequence classification demo script)

MultiHop

For generating the multihop dataset, we need to download the fever.db, see how to obtain this here

After having predicted a first pass, we can retrieve multihop pages by running

python src/retrieve_multihop_evidence.py --db_file fever.db --predictions predictions_sentence_retrieval.csv --fever_data dev.jsonl --outfile_name multi_evidence_sample_data.jsonl

Afterwards, we can predict these sentences the same way as before

python src/main.py --do_predict --model_name sentence-selection-bigbird-base --eval_file multi_evidence_sample_data.jsonl --predict_filename predictions_multihop_sentence_retrieval.csv

questions

If anything should not work or is unclear, please don't hesitate to contact the authors

Dominik Stammbach ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
README.md		README.md
multihop_evidence_sample_data.jsonl		multihop_evidence_sample_data.jsonl
requirements.txt		requirements.txt
sample_data.jsonl		sample_data.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evidence Selection as a Token-Level Prediction Task

Installation

Models

Run the models on sample data

re-train the models

The pipeline

MultiHop

questions

About

Releases

Packages

Contributors 2

Languages

dominiksinsaarland/document-level-FEVER

Folders and files

Latest commit

History

Repository files navigation

Evidence Selection as a Token-Level Prediction Task

Installation

Models

Run the models on sample data

re-train the models

The pipeline

MultiHop

questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages