Skip to content

UKPLab/emnlp2020-faithful-rationales

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision

This project contains code to reproduce the results of the following paper:


@inproceedings{glockner-etal-2020-think,
	title = "Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision",
	author = "Glockner, Max  and Habernal, Ivan and Gurevych, Iryna",
	booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
	month = nov,
	year = "2020",
	address = "Online",
	publisher = "Association for Computational Linguistics",
	url = "https://www.aclweb.org/anthology/2020.findings-emnlp.97",
	doi = "10.18653/v1/2020.findings-emnlp.97",
	pages = "1080--1095"
}

Abstract: Evaluating the trustworthiness of a model's prediction is essential for differentiating between "right for the right reasons" and "right for the wrong reasons". Identifying textual spans that determine the target label, known as faithful rationales, usually relies on pipeline approaches or reinforcement learning. However, such methods either require supervision and thus costly annotation of the rationales or employ non-differentiable models. We propose a differentiable training-framework to create models which output faithful rationales on a sentence level, by solely applying supervision on the target task. To achieve this, our model solves the task based on each rationale individually and learns to assign high scores to those which solved the task best. Our evaluation on three different datasets shows competitive results compared to a standard BERT blackbox while exceeding a pipeline counterpart's performance in two cases. We further exploit the transparent decision-making process of these models to prefer selecting the correct rationales by applying direct supervision, thereby boosting the performance on the rationale-level.

Link: https://www.aclweb.org/anthology/2020.findings-emnlp.97/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions

Disclaimer:

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Requirements

All experiments are run with python 3.6. To reproduce install the following packages:

  • allennlp 0.9.0
  • docopt 0.6.2
  • pytorch-transformers 1.1.0
  • numpy 1.17.4
  • pandas 0.25.3
  • matplotlib 3.1.1
  • scipy 1.3.2
  • seaborn 0.9.0
  • spacy 2.1.0

Data

The datasets used in this work are taken as provided by ERASER (http://www.eraserbenchmark.com/):

  • MultiRC
  • FEVER
  • Movies

Additionally we used FeverSymmetric as provided by https://github.com/TalSchuster/FeverSymmetric

Train models

Models are trained via allennlp. You can find the config files for each seed and each experiment in the exp_configs directory. To train a model run

$ allennlp train exp_configs/<file.json> --include-package reasoning_lib -s <output_destination>

Make sure the config file points to the location of the dataset.

To get the predictions of a trained model run

$ allennlp predict --output-file <path/to/output> --include-package reasoning_lib --predictor baseline_blackbox_predictor --use-dataset-reader --silent <path/to/model> <path/to/data>

To evaluate the predictions

  • Adjust the path variables withinreasoning_lib/analyze/models.py to point to the predictions.
  • Execute the Jupyter Notebook reasoning_lib/analyze/Result Metrics.ipynb.

License

Apache License Version 2.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published