Skip to content

Latest commit

 

History

History
233 lines (161 loc) · 12.6 KB

README.md

File metadata and controls

233 lines (161 loc) · 12.6 KB

e-CARE: a New Dataset for Exploring Explainable Causal Reasoning

1. Brief Introduction

Understanding causality has vital importance for various Natural Language Processing (NLP) applications. Beyond the labeled instances, conceptual explanations of the causality can provide a deep understanding of the causal fact to facilitate the causal reasoning process. We present a human-annotated explainable CAusal REasoning dataset (e-CARE), which contains over 20K causal reasoning questions, together with natural language formed explanations of the causal questions. The original paper is available at:

The following provides an instance from the e-CARE dataset:

Key Value
Premise Tom holds a copper block by hand and heats it on fire.
Ask-for Effect
Hypothesis 1 His fingers feel burnt immediately. (✔)
Hypothesis 2 The copper block keeps the same. (✖)
Conceptual Explanation Copper is a good thermal conductor.

2. Tasks based on e-CARE Dataset

Based on the e-CARE dataset, we introduce two tasks for evaluating the performance of causal reasoning: 1. A Causal Reasoning Task, 2. A Explanation Generation Task.

2.1 Causal Reasoning: Causal Reasoning Task

This task requires model to choose a correct hypothesis for a given premise from two candidates, so that the chosen hypothesis can form a valid causal fact with the premise. Each instance of Causal Reasoning (./dataset/Causal_Reasoning) is a line in ./dataset/Causal_Reasoning/train.jsonl or ./dataset/Causal_Reasoning/dev.jsonl, each line is in json format, python package jsonlines can handle this format. The keys and values of a line dict are listed as follows:

{
  "index": "train-0",
  "premise": "Tom holds a copper block by hand and heats it on fire.",
  "ask-for": "effect", 
  "hypothesis1": "His fingers feel burnt immediately.", 
  "hypothesis2": "The copper block keeps the same.", 
  "label": 0
} 

2.2 Explanation Generation: Conceptual Explanation Generation Task

This task requires model to generate a explanation for a provided causal fact. For example, as the above instance shows, given the causal fact <Cause: Tom holds a copper block by hand and heats it on fire. Effect: His fingers feel burnt immediately.>, the cusal explanation generation task requires model to generate an explanation for the given causal fact. Each instance of Explanation Generation (./dataset/Explanation_Generation) is a line in ./dataset/Explanation_Generation/train.jsonl or ./dataset/Explanation_Generation/dev.jsonl, each line is in json format, the keys and values of a line are list as follows:

{
  "index": "train-0", 
  "cause": "Tom holds a copper block by hand and heats it on fire.",
  "effect": "His fingers feel burnt immediately.",
  "conceptual_explanation": "Copper is a good thermal conductor."
}

3. Statistics

3.1 The question type distribution

Ask-for Train Test Dev Total
Cause 7,617 2,176 1,088 10881
Effect 7,311 2,088 1,044 10443
Total 14,928 4,264 2,132 21324

3.2 The label distribution

Train Test Dev
0 7463 2132 1066
1 7465 2132 1066

3.3 Average lengths of cause, effect, wrong hypothesis, and conceptual explanation

Overall Train Test Dev
Conceptual Explanation 7.63 7.62 7.60 7.77
Cause 8.51 8.51 8.47 8.56
Effect 8.34 8.33 8.38 8.31
Wrong Hypothesis 8.14 8.14 8.10 8.21
  • Number of conceptual explanations
Overall Train Test Dev
13048 10491 3814 2012

4. Dataset Download & Model Evaluation

4.1 Dataset Download

To train and evaluate the model, the complete training and dev set can be downloaded at: e-CARE

4.2 Model Evaluation

We provide two official evaluation scripts (causal_reasoning.py & conceptual_explanation_generation.py) for evaluation on causal reasoning and conceptual explanation generation tasks, respectivaly. For using the official evaluation scripts, you should output the predictions of your model into a json format file:

  • Causal Reasoning: each key is the index of the corresponding example, each value is the prediction label 0 or 1.

    {
      "dev-0": 0,
      "dev-1": 1,
      "dev-2": 0
    }

​ Then using python causal_reasoning.py prediction.json dev.jsonl to get the accuracy on dev set.

  • Conceptual Explanation Generation: each key is the index of the corresponding example, each value is the generated conceptual explanation.

    {
      "dev-0": "Copper is a good thermal conductor.",
      "dev-1": "Abalone are one of the first food items taken by otters as they move into new habitat.",
      "dev-2": "Deserts are arid environments."
    }

​ Then using python conceptual_explanation_generation.py prediction.json dev.jsonl to get the average BLEU and Rouge-l scores on dev set.

4.3 Obtaining Results on Test Set

The test set of e-CARE is a blind set, you should follow this instruction to get the performace on test set. And the submitted models will be added to the leaderboard with the premission of the author.

5. Baseline Results

On this basis, we introduce two tasks:

5.1 Causal Reasoning Task

The causal reasoning task is a multiple-choice task: given a premise event, one needs to choose a more plausible hypothesis from two candidates, so that the premise and the correct hypothesis can form into a valid causal fact.

Model Dev Test
Bart-base 73.03 71.65
Bert-base-cased 75.47 75.38
RoBERTa-base 70.64 70.73
XLNet-base-cased 75.61 74.58
ALBERT 73.97 74.60
GPT 67.59 68.15
GPT-2 70.36 69.51

5.2 Explanation Generation Task

It requires the model to generate a free-text-formed explanation for a given causal fact (composed of a premise and the corresponding correct hypothesis).

Model BLEU-1 BLEU-2 BLEU-3 BLEU-4 Rouge-1 Rouge-2 Rouge-l PPL
GPT-2 55.17 33.29 23.00 18.79 33.17 10.23 32.05 6.87
RNN 43.25 18.20 6.76 4.16 20.79 2.20 20.85 33.84
Multi-Task 56.32 35.96 26.47 22.36 35.70 12.57 34.88 6.64

6. Potential Future Directions

6.1 Serve as a Causality Knowledge Base

Causal knowledge is critical for various NLP applications. The causality knowledge provided by e-CARE can be used as a resource to boost model performance on other causal-related tasks.

We have made exploration by applying transfer learning by first finetuning a BERT model on e-CARE, then adapting the e-CARE-enhanced model (denoted as BERTE) on a causal extraction task EventStoryLine [1], two causal reasoning tasks BECauSE 2.0[2] and COPA [3], as well as a commonsense reasoning dataset CommonsenseQA[4]. The results are shown in the table below. We observe that the additional training process on e-CARE can consistently increase the model performance on all four tasks. This indicates the potential of e-CARE in providing necessary causality information for promoting causal-related tasks in multiple domains.

Dataset Metric BERT BERTE
EventStoryLine 0.9 * F1 (%) 66.5 68.1
BECauSE 2.1 Accu. (%) 76.8 81.0
COPA Accu. (%) 70.4 75.4
CommonsenseQA Accu. (%) 52.6 56.4

* Only the intra-sentence event pairs are kept for the experiment, and the cause event precedes the effect event is ensured. The train, dev and test sets are split randomly.

6.2 Abductive Reasoning

Previous literature concluded the explanation generation process as an abductive reasoning process, and highlighted the importance of the abductive explanation generation, as it may interact with the causal reasoning process to promote the understanding of the causal mechanism and increase the efficiency and reliability of causal reasoning.

For example, as the following figure shows, one may have an observation that C1: adding rock into hydrochloric acid caused E1: rock dissolved. Through abductive reasoning, one may come up with a conceptual explanation for the observation that acid is corrosive. After that, one can confirm or rectify the explanation by experiments, or resorting to external references. In this way, new ideas about causality can be involved in understanding the observed causal fact. Then if the explanation is confirmed, it can be further utilized to support the causal reasoning process by helping to explain and validate other related causal facts, such as C2: adding rust into sulphuric acid may lead to E2: rust dissolved. This analysis highlights the pivotal role of conceptual explanation in learning and inferring causality and the e-CARE dataset to provide causal explanations and support future research towards stronger human-like causal reasoning systems.

7. Citation

If you want to cite our dataset and paper, you can use this BibTex:

@inproceedings{du-etal-2022-e,
    title = "e-{CARE}: a New Dataset for Exploring Explainable Causal Reasoning",
    author = "Du, Li  and
      Ding, Xiao  and
      Xiong, Kai  and
      Liu, Ting  and
      Qin, Bing",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.33",
    pages = "432--446"
}

References

[1] Caselli T, Vossen P. The event storyline corpus: A new benchmark for causal and temporal relation extraction[C]//Proceedings of the Events and Stories in the News Workshop. 2017: 77-86.
[2] Dunietz J, Levin L, Carbonell J G. The BECauSE corpus 2.0: Annotating causality and overlapping relations[C]//Proceedings of the 11th Linguistic Annotation Workshop. 2017: 95-104.
[3] Roemmele M, Bejan C A, Gordon A S. Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning[C]//AAAI spring symposium: logical formalizations of commonsense reasoning. 2011: 90-95.
[4] Talmor A, Herzig J, Lourie N, et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4149-4158.