Wikipedia Edit Wars

This project uses a Naive Bayes model to predict what Wikipedia articles are likely to be the subject of an edit war.

This project uses the Python Natural Language Toolkit, Scikit-learn, and the Requests library. I used a Naive Bayes model to predict edit wars based on word frequencies, word counts, and density of references.

This was my final project for my college machine learning class. final_submission.pdf is a document explaining this project in-depth. If you're more interested in looking directly at the code, you can look in the src/ directory, which contains the following files:

scraper.py: This downloads articles from Wikipedia and extracts the contents out of the HTML.
analyze_text.py: This parses the raw content of a wikipedia page. It filters out stopwords, lemmatizes the remaining words, and then converts the result into a "bag-of-words".
calculate_controversy.py: This calculates articles' controversy scores using the method described in this paper.
naive_bayes.py: This runs the actual machine learning algorithm which predicts how likely it is that an article will have an edit war.
final.py: This file contains miscellaneous utility functions used throughout the other Python files.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
final_submission.pdf		final_submission.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia Edit Wars

About

Releases

Packages

Languages

dan3944/wiki-edit-wars

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Edit Wars

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages