Skip to content

HLR/Misinformation-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using Persuasive Writing Strategies to Explain and Detect Health Misinformation

This is the official implementation of "Using Persuasive Writing Strategies to Explain and Detect Health Misinformation."

Table of Contents

Dependencies

  • Compatible with Python 3.8.15
  • Dependencies should be installed with conda using the env.yml

Setup

$ conda env create --name misinformation_detection --file=env.yml
$ conda activate misinformation_detection

Dataset

Our dataset is located in the data folder:

  • data/all.xlsx
    • This file contains the sentences for task 2 and their corresponding persuasive writing strategy labels.
  • data/all_article.xlsx
    • This file contains claims and articles for tasks 1 and 3, with their labels from MultiFC.
  • data/train.xlsx
    • Training split for persuasive writing strategy detection.
  • data/train_article.xlsx
    • Training split for misinformation detection task.
  • data/test.xlsx
    • Testing split for persuasive writing strategy detection.
  • data/test_article.xlsx
    • Testing split for misinformation detection task.

Data Processing

The raw and unprocessed data, which is the export from the WebAnno tool, is available in the ./data/annotation folder. To perform data preprocessing from scratch, you need to have the MultiFC dataset in the ./data folder. Place all of the MultiFC files in the following structure:

data/multi-fc
├── dev.tsv
├── README.txt
├── snippets
├── test.tsv
└── train.tsv

Then, run the code src/data_preprocess.py to generate the clean data from raw data. Note that the current data excludes the low-frequency labels. If you want to use the dataset without the removed labels, change the function get_low_freq in the annotation.py file

Persuasive Writing Strategy Detection with RoBERTa

To train and test RoBERTa on the persuasive strategy labeling, you should run the following scripts based on the level:

  • sh scripts/layer_1.sh
  • sh scripts/layer_2.sh
  • sh scripts/layer_3.sh
  • sh scripts/layer_4.sh

You can run python src/persuasive_strategy_test.py to evaluate the performance of the trained models.

Misinformation Detection Using RoBERTa

To run the experiments for misinformation detection with RoBERTa, run the following files:

  • sh scripts/md_claim.sh, input source: claim
  • sh scripts/md_article.sh, input source: article
  • sh scripts/md_gt_strategy.sh, input source: gt (ground truth persuasive strategy labels)
  • sh scripts/md_pred_strategy.sh, input source: pred (predicted persuasive strategy labels)
  • sh scripts/md_claim_article.sh, input source: claim + article
  • sh scripts/md_claim_gt.sh, input source: claim + gt
  • sh scripts/md_claim_pred.sh, input source: claim + pred
  • sh scripts/md_claim_article_gt.sh, input source: claim + article + gt
  • sh scripts/md_claim_article_pred.sh, input source: claim + article + pred

Notice: Run one of the files with $Pred$ before running any of the experiments to generate the CSV files required for the experiments.

GPT Zero-shot Experiments

To run the GPT experiments with all of the input variations, run the sh scripts/md_gpt.sh file.

MultiFC Baseline

First, ensure the data is as mentioned in the preprocessing section. Then you can run the code src/multifc.py to train RoBERTa-based models on the MultiFC prompt subset. You can also use src/multifc_test.py to evaluate the trained model and calculate the average performance.

RAWFC Dataset

To perform the RAWFC dataset experiments with in-context learning, first, you need to download the files available at https://www.dropbox.com/sh/1w7crp3hauoec5m/AABJpG6YWbqrumypBpHJEDnSa?dl=0 and place them in the data/RAWFC folder with the following structure. Then you need to run the pre-process.py code to convert the data into the proper format. Finally, run the code gpt-3-raw-fc-in-context.py to evaluate the performance.

data/RAWFC
├── pre-process.py
├── README.MD
├── test
├── test.csv (generated by pre-process.py)
├── train
├── train.csv (generated by pre-process.py)
├── val
└── val.csv (generated by pre-process.py)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published