This repo contains the code and models for the paper (Stammbach, 2021)
Assuming Anaconda and linux, the environment can be installed with the following command:
conda create -n FEVER_bigbird python=3.6
conda activate FEVER_bigbird
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
The models (pytorch models) can be downloaded here:
python src/main.py --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv
sample_data.jsonl points to a file where each line is an example of a (claim, Wiki-page) pair
- id # the claim ID
- claim # the claim
- page # the page title
- sentences # a list -- essentally the "lines" in the official FEVER wiki-pages for a given document (where the document is split by "\n")
- label_list # a list, 1 if a sentence is part of any annotated evidence set for a given claim, 0 otherwise
- sentence_IDS # a list, np.arange(len(sentences))
output is a dataframe where we store for each sentence predicted by the model
- claim_id
- page_sentence # a tuple (Wikipage_Title, sentence_ID), for example ('2014_San_Francisco_49ers_season', 3)
- y # 1 if label_list above was 1, 0 otherwise
- predictions # token-level predictions for this sentence
- score # np.mean(predictions), model is confident that this sentence is evidence if score > 0
point to train_file and eval_file, both in the format described above, and add do_train flag
python src/main.py --do_train --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --train_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv
- takes a first pass over all (claim, WikiPage) pairs where Wikipages are predicted by (Hanselowski et al., 2018) and the FEVER baseline
- extracts all sentences it is confident that they are evidence in that pass, model_input is [CLS] claim [SEP] WikiPage [SEP]
- retrieves conditioned evidence as explained in (Stammbach and Neumann, 2019)
- retrieves hyperlinks from evidence_sentences and takes a second pass over all (claim, hyperlink) pairs where model_input is [CLS] claim, evidence_sentence [SEP] HyperlinkPage [SEP]
- sorts all predicted evidence sentences for a claim in descending order
- takes the five highest scoring sentences for each claim and concatenates those
- predicts a label for each (claim, retrieved_evidence) pair using the RTE model (trained with an outdated huggingface sequence classification demo script)
For generating the multihop dataset, we need to download the fever.db, see how to obtain this here
After having predicted a first pass, we can retrieve multihop pages by running
python src/retrieve_multihop_evidence.py --db_file fever.db --predictions predictions_sentence_retrieval.csv --fever_data dev.jsonl --outfile_name multi_evidence_sample_data.jsonl
Afterwards, we can predict these sentences the same way as before
python src/main.py --do_predict --model_name sentence-selection-bigbird-base --eval_file multi_evidence_sample_data.jsonl --predict_filename predictions_multihop_sentence_retrieval.csv
If anything should not work or is unclear, please don't hesitate to contact the authors
- Dominik Stammbach ([email protected])