forked from ajklinker/abbieklinker-MADA-portfolio
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcoding_exercise.qmd
177 lines (135 loc) · 5.57 KB
/
coding_exercise.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
title: "R Coding Exercise"
output:
html_document:
toc: FALSE
---
## Setting Up the Exercise
```{r}
library(tidyverse)
library(dslabs)
#look at help file for gapminder data
#help(gapminder)
#get an overview of data structure
str(gapminder)
#get a summary of data
summary(gapminder)
#determine the type of object gapminder is
class(gapminder)
#get better view of the data
glimpse(gapminder)
```
## Examining Africa Data
```{r}
africadata<- gapminder %>%
filter(continent == "Africa")
str(africadata)
summary(africadata)
```
### Infant Mortality and Life Expectancy
``` {r}
mort_expect <- africadata %>%
select(infant_mortality, life_expectancy, region) #kept region for sake of visualization
#examine data
str(mort_expect)
summary(mort_expect)
#Plot
ggplot()+
geom_point(aes(x=infant_mortality, y=life_expectancy, color=region), data=mort_expect)+ #kept region to more easily see trends since there's a lot of countries.
xlab("Infant Mortality Rate")+ ylab("Life Expectancy (Yrs)")+
theme_bw()
```
### Population and Life Expectancy
``` {r}
pop_expect <- africadata %>%
select(population, life_expectancy, region) #kept region for same reason as prior plot
#examine data
str(pop_expect)
summary(pop_expect)
#Plot
ggplot()+
geom_point(aes(x=log(population), y=life_expectancy, color=region), data=pop_expect)+
theme_bw()+ xlab("Log Population")+ ylab("Life Expectancy (Yrs)")
```
## Select Years
**Which years have the data missing for infant mortality?**
```{r}
mort_expect_years <- africadata %>%
select(infant_mortality, life_expectancy, region, year)%>% #include years
filter(is.na(infant_mortality)) #look for missing values
head(mort_expect_years) #print
```
### Select Data for 2000
Filter all Africa data for 2000
```{r}
africadata2000<-africadata%>%
filter(year == 2000)
#Double check code
str(africadata2000)
summary(africadata2000)
```
# Visualize the data for 2000
```{r}
#Plots
##Infant Mortality
ggplot()+
geom_smooth(aes(x=infant_mortality, y=life_expectancy), color = "gray80", alpha = 0.1, data=africadata2000)+
geom_point(aes(x=infant_mortality, y=life_expectancy, color=region), data=africadata2000)+
labs(title = "Infant Mortality vs Life Expectancy", subtitle = "Year 2000") +
xlab("Infant Mortality")+ ylab("Life Expectancy (Yrs)") +
theme_bw()
##Population
ggplot()+
geom_smooth(aes(x=log(population), y=life_expectancy), color = "gray80", alpha = 0.1, data=africadata2000)+
geom_point(aes(x=log(population), y=life_expectancy, color=region), data=africadata2000)+
labs(title="Population vs. Life Expectancy", subtitle="Year 2000") +
xlab("Log Population")+ ylab("Life Expectancy (Yrs)") +
theme_bw()
```
## Quantify the Data
**Fitting infant mortality as a predictor of life expectancy**
```{r}
fit1<- lm(life_expectancy ~ infant_mortality, data=africadata2000)
summary(fit1)
```
We have evidence to support that infant mortality is a predictor of life expectancy. Based on the regression model, we can predict that for every unit increase in infant mortality, the average life expectancy decreases by 0.19 years (p=2.83e-8). The average predicted life expectancy with an infant mortality rate of 0 is 71.3 years.
**Fitting population as a predictor of life expectancy**
```{r}
fit2<- lm(population ~ infant_mortality, data=africadata2000)
summary(fit2)
```
Similar to what we saw in the plots, we do not have evidence to say that population is associated with infant mortality (p=0.66).
**This section added by Annabella Hines**
First, I wanted to look at the gapminder dataset as a whole specifically in the year 2000. I decided to create a boxplot to see the distributions of life expectancy for each continent.
```{r}
##create an object of the gapminder data for the year 2000
continent<- gapminder %>% filter(year==2000)
#A boxplot of the continent data viewing life expectancy by continent
ggplot(data=continent, aes(x=continent, y=life_expectancy, color=continent))+geom_boxplot()+xlab("Continent")+ylab("Life Expectancy")
```
The life expectancy distributions of each continent look fairly comparable except for Africa which has a lower overall distribution, lower median, and more outliers.
Next, I compared fertility to infant mortality grouped by continent for the year 2000 to see if there were any noticeable trends.
```{r}
##created a scatterplot of fertility and infant mortality color coded by continent
ggplot(data=continent, aes(x=fertility, y=infant_mortality, color=continent))+geom_point()+ylab("Infant Mortality")+ xlab("Fertility")
```
There seems to be a positive correlation between fertility and infant mortality, with Europe having low values in each and Africa having the highest.
In the next section I wanted to explore how the life expectancy changed across the years for the different regions in Africa.
```{r}
#create an object out of africadata with year, life expectancy, and region
africaregions<- africadata %>% select(year, life_expectancy, region)
#create plot showing year vs. life expectancy color coded by region
ggplot(data=africaregions, aes(x=year, y=life_expectancy, color=region))+geom_smooth()+ylab("Life Expectancy")+xlab("Year")
```
The plot shows a relatively positive correlation between year and life expectancy, so I wanted to run a fit to verify this observation.
```{r}
#Fit life expectanct against year for the africadata
fit3<-lm(life_expectancy~year, data=africadata)
summary(fit3)
```
According to the above data, year and life expectancy for the African countries are positively correlated at the 0.05 significance level.
```{r}
#Load broom and present lm output in a table
library(broom)
map_df(list(fit3), tidy)
```