Using Persuasive Writing Strategies to Explain and Detect Health Misinformation

This is the official implementation of "Using Persuasive Writing Strategies to Explain and Detect Health Misinformation."

Dependencies

Compatible with Python 3.8.15
Dependencies should be installed with conda using the env.yml

Setup

$ conda env create --name misinformation_detection --file=env.yml
$ conda activate misinformation_detection

Dataset

Our dataset is located in the data folder:

data/all.xlsx
- This file contains the sentences for task 2 and their corresponding persuasive writing strategy labels.
data/all_article.xlsx
- This file contains claims and articles for tasks 1 and 3, with their labels from MultiFC.
data/train.xlsx
- Training split for persuasive writing strategy detection.
data/train_article.xlsx
- Training split for misinformation detection task.
data/test.xlsx
- Testing split for persuasive writing strategy detection.
data/test_article.xlsx
- Testing split for misinformation detection task.

Data Processing

The raw and unprocessed data, which is the export from the WebAnno tool, is available in the ./data/annotation folder. To perform data preprocessing from scratch, you need to have the MultiFC dataset in the ./data folder. Place all of the MultiFC files in the following structure:

data/multi-fc
├── dev.tsv
├── README.txt
├── snippets
├── test.tsv
└── train.tsv

Then, run the code src/data_preprocess.py to generate the clean data from raw data. Note that the current data excludes the low-frequency labels. If you want to use the dataset without the removed labels, change the function get_low_freq in the annotation.py file

Persuasive Writing Strategy Detection with RoBERTa

To train and test RoBERTa on the persuasive strategy labeling, you should run the following scripts based on the level:

sh scripts/layer_1.sh
sh scripts/layer_2.sh
sh scripts/layer_3.sh
sh scripts/layer_4.sh

You can run python src/persuasive_strategy_test.py to evaluate the performance of the trained models.

Misinformation Detection Using RoBERTa

To run the experiments for misinformation detection with RoBERTa, run the following files:

sh scripts/md_claim.sh, input source: claim
sh scripts/md_article.sh, input source: article
sh scripts/md_gt_strategy.sh, input source: gt (ground truth persuasive strategy labels)
sh scripts/md_pred_strategy.sh, input source: pred (predicted persuasive strategy labels)
sh scripts/md_claim_article.sh, input source: claim + article
sh scripts/md_claim_gt.sh, input source: claim + gt
sh scripts/md_claim_pred.sh, input source: claim + pred
sh scripts/md_claim_article_gt.sh, input source: claim + article + gt
sh scripts/md_claim_article_pred.sh, input source: claim + article + pred

Notice: Run one of the files with $Pred$ before running any of the experiments to generate the CSV files required for the experiments.

GPT Zero-shot Experiments

To run the GPT experiments with all of the input variations, run the sh scripts/md_gpt.sh file.

MultiFC Baseline

First, ensure the data is as mentioned in the preprocessing section. Then you can run the code src/multifc.py to train RoBERTa-based models on the MultiFC prompt subset. You can also use src/multifc_test.py to evaluate the trained model and calculate the average performance.

RAWFC Dataset

To perform the RAWFC dataset experiments with in-context learning, first, you need to download the files available at https://www.dropbox.com/sh/1w7crp3hauoec5m/AABJpG6YWbqrumypBpHJEDnSa?dl=0 and place them in the data/RAWFC folder with the following structure. Then you need to run the pre-process.py code to convert the data into the proper format. Finally, run the code gpt-3-raw-fc-in-context.py to evaluate the performance.

data/RAWFC
├── pre-process.py
├── README.MD
├── test
├── test.csv (generated by pre-process.py)
├── train
├── train.csv (generated by pre-process.py)
├── val
└── val.csv (generated by pre-process.py)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
.vscode		.vscode
data		data
scripts		scripts
src		src
README.md		README.md
env.yml		env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Persuasive Writing Strategies to Explain and Detect Health Misinformation

Table of Contents

Dependencies

Setup

Dataset

Data Processing

Persuasive Writing Strategy Detection with RoBERTa

Misinformation Detection Using RoBERTa

GPT Zero-shot Experiments

MultiFC Baseline

RAWFC Dataset

About

Releases

Packages

Languages

HLR/Misinformation-Detection

Folders and files

Latest commit

History

Repository files navigation

Using Persuasive Writing Strategies to Explain and Detect Health Misinformation

Table of Contents

Dependencies

Setup

Dataset

Data Processing

Persuasive Writing Strategy Detection with RoBERTa

Misinformation Detection Using RoBERTa

GPT Zero-shot Experiments

MultiFC Baseline

RAWFC Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages