Skip to content

Latest commit

 

History

History
17 lines (14 loc) · 1.45 KB

README.md

File metadata and controls

17 lines (14 loc) · 1.45 KB

Wikipedia Edit Wars

This project uses a Naive Bayes model to predict what Wikipedia articles are likely to be the subject of an edit war.

What is an edit war?

This project uses the Python Natural Language Toolkit, Scikit-learn, and the Requests library. I used a Naive Bayes model to predict edit wars based on word frequencies, word counts, and density of references.

This was my final project for my college machine learning class. final_submission.pdf is a document explaining this project in-depth. If you're more interested in looking directly at the code, you can look in the src/ directory, which contains the following files:

  • scraper.py: This downloads articles from Wikipedia and extracts the contents out of the HTML.
  • analyze_text.py: This parses the raw content of a wikipedia page. It filters out stopwords, lemmatizes the remaining words, and then converts the result into a "bag-of-words".
  • calculate_controversy.py: This calculates articles' controversy scores using the method described in this paper.
  • naive_bayes.py: This runs the actual machine learning algorithm which predicts how likely it is that an article will have an edit war.
  • final.py: This file contains miscellaneous utility functions used throughout the other Python files.