NLP in Python to build a misinformation classifier of Covid-19 tweets

Page available at https://faisaljina.github.io/NLP-Twitter-Covid/

NLP in Python to build a misinformation classifier of Covid-19 tweets

This project uses a research dataset from the Center for Machine Learning and Health at Carnegie Mellon University. The data comprises public Twitter posts on the topic of Covid-19, but, whilst derived from public postings, is itself privately held, so is not available to share here in full. The data is ~3.5k tweets that have been labelled into 16 different categories depending on the content. These categories are:

Irrelevant
Politics
True public health response
News
Calling out or correction
Sarcasm or satire
Fake cure
Conspiracy
True prevention
Ambiguous or hard to classify
False fact or prevention
Panic buying
Commercial activity or promotion
Fake treatment
Emergency
False public health response

The goal was to build a classifier to identify tweet misinformation around Covid-19 using NLP techniques. This required use of the re, nltk, and nrclex packages, as well as sklearn and imblearn for modelling.

Features

EDA and Visualisation
Tokenizing, Stopwords, Normalisation
Feature Engineering
Parts-of-Speech tagging
Sentiment Analysis

Techniques

RegEx
POS analysis
VADER Affect analysis
NRCLex Emotional analysis
CountVectorizer
TfidfVectorizer
SMOTE-NC
Bayesian & Logistic Regression Modelling

Summary

The data was explored, cleaned, and tokenized, removing stopwords and normalising the tweets.
Simple metrics were derived using RegEx, before POS tagging and analysis.
Sentiment analysis was conducted using VADER for affect and NRCLex for emotion.
Features were engineered and the data split and oversampled to address class imbalance.
The classification modelling used combinations of Count/Tfidf Vectorizer with Naive Bayes/Logistic Regression.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
NLP coursework.ipynb		NLP coursework.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Page available at https://faisaljina.github.io/NLP-Twitter-Covid/

NLP in Python to build a misinformation classifier of Covid-19 tweets

Features

Techniques

Summary

About

Releases

Packages

Languages

License

faisaljina/NLP-Twitter-Covid

Folders and files

Latest commit

History

Repository files navigation

Page available at https://faisaljina.github.io/NLP-Twitter-Covid/

NLP in Python to build a misinformation classifier of Covid-19 tweets

Features

Techniques

Summary

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages