This repo contains the code, dataset and models for our ACL short paper (Stammbach et al., 2023)
Assuming Anaconda and linux, the environment can be installed with the following command:
conda create -n environmental_claims python=3.6
conda activate environmental_claims
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
- Our dataset is stored the folder "data".
- A fine-tuned ClimateBERT pytorch model on our dataset can be downloaded here: climatebert-environmental-claims
We also host the dataset and model on huggingface
In our paper and dataset, we discard sentences where 2 annotators say a sentence "is an environmental claim", but 2 annotators disagree and therefore we have a tie. We host the full dataset here, including all 3000 sentences, and agreement between annotators (either 0.5, 0.75 or 1.0). Labels are:
- "yes" for environmental claims
- "no" for others
- "tie" if a datapoints has an agreement of 0.5
To predict environmental claims in custom data, we provide an inference script. For running the script with some data (either a "jsonl" file with a column "sentences" or "text", or a ".txt" file with one sentence by line"), simply run the following python command.
python src/inference_script.py --filename data/test.jsonl --model_name climatebert-environmental-claims --outfile_name environmental_claim_predictions.csv
To replicate the baseline experiments, run the following python script.
python src/baselines.py
(this prints the rows in our Table 2 for these experiments)
To fine-tune a climatebert model on our dataset, run the following python script.
python transformer_models.py --do_save --save_path climatebert-environmental-claims --model_name climatebert/distilroberta-base-climate-f
(this also saves the resulting fine-tuned model in directory --save_path)
If anything should not work or is unclear, please don't hesitate to contact the authors
- Dominik Stammbach ([email protected])