Skip to content

Latest commit

 

History

History
6 lines (5 loc) · 1.89 KB

README.md

File metadata and controls

6 lines (5 loc) · 1.89 KB

RedditNFLClassifier

Code repository for 2019 senior thesis entitled "The Wisdom of Crowds: A Natural Language Processing Approach to Forecasting Sports Betting Markets Using Social Media Fan Sentiment"

Abstract

The wisdom of crowds, or the idea that the collective knowledge of a group of people can be regarded as an alternative to expert opinion, has been repeatedly shown to be an effective indicator of sporting outcomes. With NFL betting being the largest sports betting market in the United States and fan sentiment becoming readily available and abundant with the rise of social media platforms such as Reddit, we study the predictive relationship between social media output and NFL outcomes. In particular, we focus on two most popular forms of sports betting on the per game level, wagering which team will win the point spread (WTS), a handicap for the team bookkeepers expect will win the game, and whether the combined score will be above or below the over-under line, a prediction for the total score set by bookkeepers. Popular natural language processing representations of Reddit text including bag-of-words, term frequency inverse document frequency, and out-of-the-box sentiment scoring models as a proxy for public sentiment were shown to be successful regressors in several common machine learning models. Training on games from 2012-2018 seasons, discriminative models (logistic regression and linear support vector machines) using bag-of-words and term frequency inverse document frequency representations and nearest neighbor models using sentiment scoring algorithms (Vader and Afinn) were found to be most successful at this classication task, achieving out-of-sample testing accuracies of up to 54%, well above the 52.4% required to generate a profitable betting strategy. Further attempts at implementing an LSTM neural network have also shown similar success.