Skip to content

Predicting what Wikipedia articles are most likely to be the subject of an edit war.

Notifications You must be signed in to change notification settings

dan3944/wiki-edit-wars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Edit Wars

This project uses a Naive Bayes model to predict what Wikipedia articles are likely to be the subject of an edit war.

What is an edit war?

This project uses the Python Natural Language Toolkit, Scikit-learn, and the Requests library. I used a Naive Bayes model to predict edit wars based on word frequencies, word counts, and density of references.

This was my final project for my college machine learning class. final_submission.pdf is a document explaining this project in-depth. If you're more interested in looking directly at the code, you can look in the src/ directory, which contains the following files:

  • scraper.py: This downloads articles from Wikipedia and extracts the contents out of the HTML.
  • analyze_text.py: This parses the raw content of a wikipedia page. It filters out stopwords, lemmatizes the remaining words, and then converts the result into a "bag-of-words".
  • calculate_controversy.py: This calculates articles' controversy scores using the method described in this paper.
  • naive_bayes.py: This runs the actual machine learning algorithm which predicts how likely it is that an article will have an edit war.
  • final.py: This file contains miscellaneous utility functions used throughout the other Python files.

About

Predicting what Wikipedia articles are most likely to be the subject of an edit war.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages