Predicting vaccineREADME_GitHub-ADS 502.docx
Predicting vaccine probabilities
This project is a part of the ADS-502 course in the Applied Data Science Program at the University of San Diego.
-- Project Status: [Completed]
Project Intro/Objective
Nearly a decade ago, public health professionals battled the influenza A virus subtype H1N1 (H1N1) global pandemic. H1N1 was a new influenza virus, giving health care experts an advantage in producing and distributing the H1N1 vaccine since they had years of experience with influenza vaccination. However, the public health effort to develop and distribute a safe and effective H1N1 vaccination campaign was plagued with challenges, including communication of vaccine availability and suggested participation, vaccine supply chain issues, and public concern regarding the safety and efficacy of the H1N1 vaccine. "These challenges eroded public trust in the H1N1 vaccination program: a November 2009 survey found that 54% of adults believed the federal government was doing a "poor" or "very poor" job at providing the country with an adequate supply of H1N1 vaccine" (Newport, 2021).
The objective of this study was to deploy data mining methods to determine our ability to
predict the likelihood of a patient adopting the H1N1 vaccine using behavioral and socio-demographic data. The initial goal was to determine which method most accurately predicted H1N1 vaccine adoption. The secondary goal is to determine the most influential features in the likelihood of H1N1 vaccine adoption to help with current COVID19 public health communication and future vaccination efforts.
Partner(s)/Contributor(s)
• Roberto Cancel
• Kevin Stewart
Methods Used
• Data Mining
• Machine Learning
• Data Visualization
• Data Engineering
• Programming
• Data Manipulation
Technologies
• R
Project Description
The dataset was obtained from DataDriven.com’s Flu Shot Learning: Predicting H1N! and Seasonal Flu Vaccines competition using data from the National 2009 H1N1 Flu Survey. The dataset has 35 variables and 26,707 observations. The objective was to predict likelihood of vaccinations based on behavioral and demographics of the respondents. R programming language was used. Data was cleaned and analysis performed and Models used were Random Forests, Naïve Bayes , and Logistic regression.