generated from r4ds/bookclub-template
-
Notifications
You must be signed in to change notification settings - Fork 5
/
05_multivariate-graphs.Rmd
182 lines (156 loc) · 6.02 KB
/
05_multivariate-graphs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# Multivariate Graphs
**Learning objectives:**
- Incorporate multiple variables into data visualizations beyond the x and y axes via grouping them together
- Explore some of `ggplot2`'s array of customization in the context of grouping and facet-wrapping
- Use facet wrapping & facet grids to create separate graphs based on the values of different variables
## Introduction {-}
- In previous chapters, we learned to plot one to two variables on a single graph
- Here, we are looking at the `Salaries` data from the `carData` package
```{r}
library(pacman)
p_load(ggplot2, dplyr)
data(Salaries, package="carData")
ggplot(Salaries, aes(x=yrs.since.phd,
y=salary)) +
geom_point() +
labs(title="Academic salary by years since degree")
```
- What if we want to visualize more than two variables?
- The `Salaries` data set has multiple columns we can plot
```{r}
Salaries %>% head()
```
## Grouping
- **Grouping** involves gathering multiple variables together and plotting them on a single graph
- You can modify different attributes of data points to reflect values of other variables
- Ex. color, size, shape, etc (within `aes()` function for `mapping` argument in `ggplot` functions)
```{r, echo=F}
ggplot(Salaries, aes(x=yrs.since.phd,
y=salary,
color=rank)) +
geom_point() +
labs(title="Academic salary by years since degree")
```
\newpage
```{r, echo=F}
ggplot(Salaries, aes(x=yrs.since.phd,
y=salary,
color=rank, shape=sex)) +
geom_point() +
labs(title="Academic salary by years since degree")
```
- NOTE: Be mindful of how busy your graph looks.
- You can also modify the size and transparency of points.
```{r}
ggplot(Salaries, aes(x=yrs.since.phd,
y=salary,
size=yrs.service)) +
# alpha allows you to change transparency
# of points
geom_point(alpha=0.6) +
labs(title="Academic salary by years since degree")
```
- Here's another example of grouping with some regression lines incorporated into the graph.
- The benefits of grouping is that it also affects other `ggplot` layers you apply on top of the base `ggplot` function.
```{r}
# plot experience vs. salary with
# fit lines (color represents sex)
ggplot(Salaries,
aes(x = yrs.since.phd,
y = salary,
color = sex)) +
geom_point(alpha = .4,
size = 3) +
geom_smooth(se=FALSE,
method = "lm",
formula = y~poly(x,2),
size = 1.5) +
labs(x = "Years Since Ph.D.",
title = "Academic Salary by Sex and Years Experience",
subtitle = "9-month salary for 2008-2009",
y = "",
color = "Sex") +
scale_y_continuous(label = scales::dollar) +
scale_color_brewer(palette = "Set1") +
theme_minimal()
```
## Faceting
- Faceting is when we have separate graphs that each correspond to different levels of a variable.
- `facet_wrap` lets you create several graphs of the same type for each level of the variable being faceted at.
- Formula for facet wrapping is
```{r}
ggplot(Salaries, aes(x = salary)) +
geom_histogram(fill = "cornflowerblue",
color = "white") +
facet_wrap(~rank, ncol = 1) +
labs(title = "Salary histograms by rank")
```
- The `facet_grid` function is best suited for faceting against multiple variables.
- Formula for the `facet_grid` function is `row variables ~ column variables.`
```{r}
# plot salary histograms by rank and sex
ggplot(Salaries, aes(x = salary / 1000)) +
geom_histogram(color = "white",
fill = "cornflowerblue") +
facet_grid(sex ~ rank) +
labs(title = "Salary histograms by sex and rank",
x = "Salary ($1000)")
```
- You can also combine grouping variables together and faceting together in one graph, such as what we do with the `gapminder` data from the `gapminder` package below.
- Note that the dot in the faceting formula `.~ rank + discipline` means that there are no row variables so `rank + discipline ~.` means no column variables.
- Also note the theme functions used in the code below are used to simplify the background color, rotate the x-axis text, and make the font size smaller
```{r}
# calculate means and standard errors by sex,
# rank and discipline
library(dplyr)
plotdata <- Salaries %>%
group_by(sex, rank, discipline) %>%
summarize(n = n(),
mean = mean(salary),
sd = sd(salary),
se = sd / sqrt(n))
# create better labels for discipline
plotdata$discipline <- factor(plotdata$discipline,
labels = c("Theoretical",
"Applied"))
# create plot
ggplot(plotdata,
aes(x = sex,
y = mean,
color = sex)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean - se,
ymax = mean + se),
width = .1) +
scale_y_continuous(breaks = seq(70000, 140000, 10000),
label = scales::dollar) +
facet_grid(. ~ rank + discipline) +
theme_bw() +
theme(legend.position = "none",
panel.grid.major.x = element_blank(),
panel.grid.minor.y = element_blank()) +
labs(x="",
y="",
title="Nine month academic salaries by gender, discipline, and rank",
subtitle = "(Means and standard errors)") +
scale_color_brewer(palette="Set1")
```
## Meeting Videos {-}
### Cohort 1 {-}
`r knitr::include_url("https://www.youtube.com/embed/Wz0WCFv-gOk")`
<details>
<summary> Meeting chat log </summary>
```
00:01:23 Oluwafemi Oyedele: Hi Lydia!!!
00:01:56 Lydia Gibson: Hey Femi!
00:05:05 Oluwafemi Oyedele: yes, I am presenting next week!!!
00:05:58 Lydia Gibson: Awesome!
00:15:05 Oluwafemi Oyedele: discipline: A factor with levels A (“theoretical” departments) or B (“applied” departments)
00:52:08 Kotomi Oda: I like this plot style
00:52:32 Lydia Gibson: Yeah very cool
00:52:55 Tiffany Kollah: this was very good Ken!
00:53:42 Tiffany Kollah: thank you so much!!! 🙂
00:54:02 Oluwafemi Oyedele: Thank you!!!
00:54:25 Kotomi Oda: Yay!
```
</details>