A formal report written in R that uses Natural Language Processing and Machine Learning to classify news article claims as either true or false.
This project used a dataset of 1,911 unique PolitiFact claims and their associated truth ratings. Features were extracted from each claim using a bag-of-n-grams model with tf-idf as the scoring metric (the vocabulary size was first reduced using using lemmatization and stop word removal among other text cleaning procedures).
Seven machine learning classification models (including a random forest, a multilayer perceptron, and a recurrent neural network) were fit and a maximum classification accuracy of 71% was achieved.
The project can be viewed either as a formal PDF here or as a Bookdown website here.
- MIT License
- Newspaper icon icon by Icons8