Understanding causality has vital importance for various Natural Language Processing (NLP) applications. Beyond the labeled instances, conceptual explanations of the causality can provide a deep understanding of the causal fact to facilitate the causal reasoning process. We present a human-annotated explainable CAusal REasoning dataset (e-CARE), which contains over 20K causal reasoning questions, together with natural language formed explanations of the causal questions. The original paper is available at:
The following provides an instance from the e-CARE dataset:
Key | Value |
---|---|
Premise | Tom holds a copper block by hand and heats it on fire. |
Ask-for | Effect |
Hypothesis 1 | His fingers feel burnt immediately. (✔) |
Hypothesis 2 | The copper block keeps the same. (✖) |
Conceptual Explanation | Copper is a good thermal conductor. |
Based on the e-CARE dataset, we introduce two tasks for evaluating the performance of causal reasoning: 1. A Causal Reasoning Task, 2. A Explanation Generation Task.
This task requires model to choose a correct hypothesis for a given premise from two candidates, so that the chosen hypothesis can form a valid causal fact with the premise. Each instance of Causal Reasoning (./dataset/Causal_Reasoning
) is a line in ./dataset/Causal_Reasoning/train.jsonl
or ./dataset/Causal_Reasoning/dev.jsonl
, each line is in json format, python package jsonlines
can handle this format. The keys and values of a line dict are listed as follows:
{
"index": "train-0",
"premise": "Tom holds a copper block by hand and heats it on fire.",
"ask-for": "effect",
"hypothesis1": "His fingers feel burnt immediately.",
"hypothesis2": "The copper block keeps the same.",
"label": 0
}
This task requires model to generate a explanation for a provided causal fact. For example, as the above instance shows, given the causal fact <Cause: Tom holds a copper block by hand and heats it on fire. Effect: His fingers feel burnt immediately.>, the cusal explanation generation task requires model to generate an explanation for the given causal fact. Each instance of Explanation Generation (./dataset/Explanation_Generation
) is a line in ./dataset/Explanation_Generation/train.jsonl
or ./dataset/Explanation_Generation/dev.jsonl
, each line is in json format, the keys and values of a line are list as follows:
{
"index": "train-0",
"cause": "Tom holds a copper block by hand and heats it on fire.",
"effect": "His fingers feel burnt immediately.",
"conceptual_explanation": "Copper is a good thermal conductor."
}
Ask-for | Train | Test | Dev | Total |
---|---|---|---|---|
Cause | 7,617 | 2,176 | 1,088 | 10881 |
Effect | 7,311 | 2,088 | 1,044 | 10443 |
Total | 14,928 | 4,264 | 2,132 | 21324 |
Train | Test | Dev | |
---|---|---|---|
0 | 7463 | 2132 | 1066 |
1 | 7465 | 2132 | 1066 |
Overall | Train | Test | Dev | |
---|---|---|---|---|
Conceptual Explanation | 7.63 | 7.62 | 7.60 | 7.77 |
Cause | 8.51 | 8.51 | 8.47 | 8.56 |
Effect | 8.34 | 8.33 | 8.38 | 8.31 |
Wrong Hypothesis | 8.14 | 8.14 | 8.10 | 8.21 |
- Number of conceptual explanations
Overall | Train | Test | Dev |
---|---|---|---|
13048 | 10491 | 3814 | 2012 |
To train and evaluate the model, the complete training and dev set can be downloaded at: e-CARE
We provide two official evaluation scripts (causal_reasoning.py
& conceptual_explanation_generation.py
) for evaluation on causal reasoning and conceptual explanation generation tasks, respectivaly. For using the official evaluation scripts, you should output the predictions of your model into a json format file:
-
Causal Reasoning
: each key is theindex
of the corresponding example, each value is the prediction label0
or1
.{ "dev-0": 0, "dev-1": 1, "dev-2": 0 }
Then using python causal_reasoning.py prediction.json dev.jsonl
to get the accuracy on dev set.
-
Conceptual Explanation Generation
: each key is theindex
of the corresponding example, each value is the generated conceptual explanation.{ "dev-0": "Copper is a good thermal conductor.", "dev-1": "Abalone are one of the first food items taken by otters as they move into new habitat.", "dev-2": "Deserts are arid environments." }
Then using python conceptual_explanation_generation.py prediction.json dev.jsonl
to get the average BLEU and Rouge-l scores on dev set.
The test set of e-CARE is a blind set, you should follow this instruction to get the performace on test set. And the submitted models will be added to the leaderboard with the premission of the author.
On this basis, we introduce two tasks:
The causal reasoning task is a multiple-choice task: given a premise event, one needs to choose a more plausible hypothesis from two candidates, so that the premise and the correct hypothesis can form into a valid causal fact.
Model | Dev | Test |
---|---|---|
Bart-base | 73.03 | 71.65 |
Bert-base-cased | 75.47 | 75.38 |
RoBERTa-base | 70.64 | 70.73 |
XLNet-base-cased | 75.61 | 74.58 |
ALBERT | 73.97 | 74.60 |
GPT | 67.59 | 68.15 |
GPT-2 | 70.36 | 69.51 |
It requires the model to generate a free-text-formed explanation for a given causal fact (composed of a premise and the corresponding correct hypothesis).
Model | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | Rouge-1 | Rouge-2 | Rouge-l | PPL |
---|---|---|---|---|---|---|---|---|
GPT-2 | 55.17 | 33.29 | 23.00 | 18.79 | 33.17 | 10.23 | 32.05 | 6.87 |
RNN | 43.25 | 18.20 | 6.76 | 4.16 | 20.79 | 2.20 | 20.85 | 33.84 |
Multi-Task | 56.32 | 35.96 | 26.47 | 22.36 | 35.70 | 12.57 | 34.88 | 6.64 |
Causal knowledge is critical for various NLP applications. The causality knowledge provided by e-CARE can be used as a resource to boost model performance on other causal-related tasks.
We have made exploration by applying transfer learning by first finetuning a BERT model on e-CARE, then adapting the e-CARE-enhanced model (denoted as BERTE) on a causal extraction task EventStoryLine [1], two causal reasoning tasks BECauSE 2.0[2] and COPA [3], as well as a commonsense reasoning dataset CommonsenseQA[4]. The results are shown in the table below. We observe that the additional training process on e-CARE can consistently increase the model performance on all four tasks. This indicates the potential of e-CARE in providing necessary causality information for promoting causal-related tasks in multiple domains.
Dataset | Metric | BERT | BERTE |
---|---|---|---|
EventStoryLine 0.9 * | F1 (%) | 66.5 | 68.1 |
BECauSE 2.1 | Accu. (%) | 76.8 | 81.0 |
COPA | Accu. (%) | 70.4 | 75.4 |
CommonsenseQA | Accu. (%) | 52.6 | 56.4 |
* Only the intra-sentence event pairs are kept for the experiment, and the cause event precedes the effect event is ensured. The train, dev and test sets are split randomly.
Previous literature concluded the explanation generation process as an abductive reasoning process, and highlighted the importance of the abductive explanation generation, as it may interact with the causal reasoning process to promote the understanding of the causal mechanism and increase the efficiency and reliability of causal reasoning.
For example, as the following figure shows, one may have an observation that C1: adding rock into hydrochloric acid caused E1: rock dissolved. Through abductive reasoning, one may come up with a conceptual explanation for the observation that acid is corrosive. After that, one can confirm or rectify the explanation by experiments, or resorting to external references. In this way, new ideas about causality can be involved in understanding the observed causal fact. Then if the explanation is confirmed, it can be further utilized to support the causal reasoning process by helping to explain and validate other related causal facts, such as C2: adding rust into sulphuric acid may lead to E2: rust dissolved. This analysis highlights the pivotal role of conceptual explanation in learning and inferring causality and the e-CARE dataset to provide causal explanations and support future research towards stronger human-like causal reasoning systems.
If you want to cite our dataset and paper, you can use this BibTex:
@inproceedings{du-etal-2022-e,
title = "e-{CARE}: a New Dataset for Exploring Explainable Causal Reasoning",
author = "Du, Li and
Ding, Xiao and
Xiong, Kai and
Liu, Ting and
Qin, Bing",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.33",
pages = "432--446"
}