Skip to content

Repository for our DS3000 final project using machine learning to predict COVID-19 case count.

Notifications You must be signed in to change notification settings

lucademian/ds3000_covid_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 

Repository files navigation

Predicting COVID-19 Case Count Based on Prior Factors

DS 3000 Summer 1 2020 Final Project

By Luca Demian and Max Breslauer-Friedman

This is our repository for all our final project stuff.

Useful Links

Description

For our project, we wanted to do something using COVD-19 data. At first, we were thinking of different unsupervised project ideas about analyzing the factors which contributed to the pandemic spread in the United States. After review we switched to thinking of supervised learning problems, and decided to attempt to develop a model to predict COVID-19 case count in a particular county based on several prior factors we identified. We decided to look at case count N days after the first case recorded, to control for how the infection appeared non-uniformly throughout the states. We are going to examine several different timelines, or N-day values.

Data Sources

Data Source Link(s) Date Features
NY Times Link May 22, 2020 date_0days, cases_0days, deaths_0days, date_90days, cases_90days, deaths_90days, fips
Kaiser Health Link March 30, 2020 icu_beds, kaiser_total_population, kaiser_60plus_population
US Census Commuting Data Link 2015 commuting_within, commuting_out, commuting_in
US Census Population Density Link 2019 density, pop_2019
Political Majority Link 2016 percent_democrat, percent_gop
Poverty Data Link Feb. 5, 2020 percent_in_poverty
Age Data Link 2018 pop_under18, pop_over65, pop_2018, median_age
Education Data Link 2018 percent_with_bachelors
Income Data Link 2018 median_income

Selected Features

These are the features we selected to train our models on, out of the selection of features we gathered data for.

Name Type Description
cases_60days Target This is the count of cases in each county 60 days after the first recorded case.
cases_70days Target This is the count of cases in each county 70 days after the first recorded case.
icu_beds Feature Count of ICU beds per county.
commuting_in Feature Population commuting into this county every day.
commuting_out Feature Population commuting out of this county every day.
commuting_within Feature Population commuting within this county every day.
density Feature A measure of poulation density in this county.
percent_democrat Feature Percentage of 2016 voters voting Democratic.
percent_in_poverty Feature Percentage of county population (all ages) in poverty.
pop_under18 Feature Population in county aged under 18.
pop_over65 Feature Population in county aged 65+.
median_age Feature Median age for the county.
percent_with_bachelors Feature Percent of people in county attaining a bachelors degree or higher.
median_income Feature A measure of the median income in this county.
pop_2019 Feature County population as of 2019.

Important Notes

Because the NY Times data is our base dataset, and they report all counties in New York City as a single geography, we are missing the New York City counties in our final dataset. This effects: New York, Kings, Queens, Bronx and Richmond Counties

About

Repository for our DS3000 final project using machine learning to predict COVID-19 case count.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published