Group 6 ProjectWork
Team Members Details: Naragam Yamini Chandu Tadikonda Chittaranjan Ganni Sri Venkata Manikanta Medisetti Sudhindhra
INTRODUCTION
Coronaviruses are a family of related RNA viruses that infect both mammals and birds and cause sickness. They can cause mild to fatal respiratory tract infections in both people and birds. Some cases of the common cold in humans (which is also brought on by other viruses, primarily rhinoviruses), while more deadly types can bring about SARS, MERS, and COVID-19, which is driving the ongoing pandemic. They cause diarrhea in pigs and cows and hepatitis and encephalomyelitis in mice.
On January 30, 2020, and March 11, 2020, respectively, the World Health Organization (WHO) formally declared the SARS-CoV-2 outbreak a Public Health Emergency of International Concern. WHO recommended nations to use stringent social segregation and quarantine measures to protect public from the virus spread. Covid-19 has severely affected the lives of people around the world, it is like the world has been stopped for a short period of time due to the lockdown and rise of covid cases around the various places on this planet.
Here in this project, we can able to see how many humans were affected with covid 19 virus and perform analysis to check how males are affected, and the females. We plot the analysis charts for the cases recorded and try to get a sense of the outcome like which section of people are affected worse and their patterns for the spread of the covid-19. Along with we are trying to find what is the number of people reported to have been admitted and among which are trying to segregate(divide) into different categories and get an understanding of how hospitalization affects people’s health conditions. Based on tested, confirmed, tested, deceased cases count we have developed a model to predict what is happening per year due to corona virus also By finding these results, we can predict and analyse how a pandemic affects the world's health condition and population.
LITERATURE REVIEW
COVID-19 Open-Data is a global-scale spatially granular meta-dataset for coronavirus disease from this paper we have learned that to obtain a vast quantity of metadata from various locations relating to information named epidemiological from multiple unique locations and most of the data is acquired from the respective authorizes using open-source software, the paper in aggregating data to the public which is originated from government bodies.
Survival of Hospitalized COVID-19 Patients in Northern Italy: A Population-Based Cohort Study by the ITA-COVID-19 Network by understanding this paper we can observe that a cohort study was conducted in 3 areas of Northern Italy which are heavily affected by covid-19 using loco-regional covid-19 surveillance which would be linked to the hospital discharge database. We can observe the survival rate of younger women is higher compared to other groups and it also shows patients with a median age of 61 who are male have died within 14 days.
Weather Conditions and COVID-19 Incidence in a Cold Climate: A Time-Series Study in Finland in this paper we have got to know whether climate plays any role in the covid-19 infection spreading rate. For they have collected the covid-19 daily counts for 3 months from Dec 31 to May 31 2020 across the nation around all districts and the metrological department has measured the air quality from the nearby stations. The outcome obtained from the result we have concluded that there might not be any strong evidence that climate may increase the covid-19 spread but as the data is limited and there are few expectations thus they could not come to a conclusion if the weather spreads covid-19.
Data source and collection
The data used for this project is gathered from "Google Covid-19 Open Data", the Google Health COVID-19 Open Data Repository is one of the most comprehensive collections of up-to-date COVID-19-related information. Comprising data from more than 20,000 locations worldwide, it contains a rich variety of data types to help public health professionals, researchers, policymakers and others in understanding and managing the virus. The datasets provide current information on COVID-19 cases, deaths, vaccination rates, hospitalizations, and more. The dataset we have considered has around 30 variables with 1048576 rows.
For the given project we are choosing 3 data sets from the Google Covid-19 Open Raw Data named as Hospitalizations, Epidemiology and By sex(Gender) where Epidemiology and Hospitalizations values are stratified with by_sex dataset based on the columns Name, Type and description to find relation between various columns and visualize, plot the values to find specific outcomes.
Summary
Here from the analysis we can see that there are high covid cases in 2021 than 2020 and 2022
From newly confirmed cases we see that females were more effected than males
From newly recovered cases we see that females were more effected than males
From newly deceased cases we see that males were more effected than females
Along that we predict the accuracy for the the respective columns and come to a conclusion that Random forest would be the ideal for the Prediction. Based on the prediction outcomes the key stakeholders could come to any idea that covid has spread rapidly and those areas the stakeholders in our case it could be the government could build new hospitals for future case if requried incase any future outbreak. While other stakeholder could the public they can have a overview of how covid spread for the people based on there gender and age and see what focus groups are heavily affected and by the outcome that people who come to this focus group can take extra precuations in future incase of any outbreak.
Our project provides Implications to the stakeholders of what happened during the covid-19 outbreak and thus they could benefit by having thought of how can we restrict the spread and also which areas have heavily affected and which section of ages section have affected compared to others.
Conclusion
In this work, we have done analysis based on machine learning in covid-19 open raw data. By measuring the accuracy of different algorithms, we found that the most suitable algorithm for predicting drugs based on various conditions is Random Forest. We believe that employing more sophisticated features and applying more powerful machine learning models, deep learning approaches can help to enhance the performance of the system.