title | author | date | output | ||||
---|---|---|---|---|---|---|---|
Playing with {ggplot2} extensions |
Pierrette Lo |
6/7/2020 |
|
Recently, a friend asked me to make a simple data visualization for her. The dataset was tiny and, to be honest, not super interesting (a very simple survey of [not very much] diversity among her department's leadership and overall membership). But the nice thing about a simple dataset is that for once I could spend less time on data cleaning and more time playing with aesthetics.
Here are the libraries I used:
library(tidyverse)
library(readxl)
library(ggalt)
library(patchwork)
library(ggtext)
The data I was given was an Excel sheet that looked like this:
I started by doing a bit of data cleanup in Excel. If the dataset had been larger, I might have tried using {readxl} to clean it up in R, but in this case it took about 30 seconds to do it in Excel.
I separated each table onto a different tab, added a "personnel" header (for "Overall" vs "Leadership" categories), and corrected the typo in Race ("Indiana").
I next used the {readxl} package to import the Excel data...
# specify path of original data
path <- "blog_data.xlsx"
# read in all sheets as a named list
data <- path %>%
excel_sheets() %>%
set_names() %>%
map(read_xlsx, path = path)
# split list into separate dataframes
list2env(data, .GlobalEnv)
...And now I have three little dataframes.
Table: ethnicity
personnel | Hispanic | Not Hispanic | Unsp |
---|---|---|---|
Dept Overall | 0.054 | 0.946 | 1E-3 |
Dept Leadership | 0.030 | 0.970 | NA |
personnel | Female | Male | Unsp |
---|---|---|---|
Dept Overall | 0.346 | 0.654 | 0 |
Dept Leadership | 0.364 | 0.636 | 0 |
personnel | White | African American | American Indian | Asian | Native Hawaiian | Multi-race | Unsp |
---|---|---|---|---|---|---|---|
Dept Overall | 0.72 | 0.066 | 6.0000000000000001E-3 | 0.152 | 2E-3 | 4.2000000000000003E-2 | 1.2E-2 |
Dept Leadership | 0.85 | 0.030 | NA | 0.120 | NA | NA | NA |
Next up: tidying each dataframe (yes, I copied and pasted more than twice and therefore should have written some functions, but again I was in a hurry to get to the fun part).
Here's what I did for the ethnicity
dataframe, which I repeated for gender
and race
.
ethnicity <- ethnicity %>%
# convert all columns except personnel to numeric
mutate_at(vars(-personnel), as.numeric) %>%
# make it tidy (i.e. long) format
pivot_longer(-personnel, names_to = "ethnicity", values_to = "percent") %>%
# convert decimals to percentages; convert `ethnicity` and `personnel` to factors
mutate(percent = percent * 100,
ethnicity = as.factor(str_replace(ethnicity, "Unsp", "Unspecified")),
personnel = as.factor(personnel)) %>%
# replace NAs with 0 (after confirming with my friend that this was the intent)
replace_na(list(percent = 0))
personnel | ethnicity | percent |
---|---|---|
Dept Overall | Hispanic | 5.4 |
Dept Overall | Not Hispanic | 94.6 |
Dept Overall | Unspecified | 0.1 |
Dept Leadership | Hispanic | 3.0 |
Dept Leadership | Not Hispanic | 97.0 |
Dept Leadership | Unspecified | 0.0 |
Now for the fun stuff! There's a whole universe of {ggplot2} extensions, many (but not all) of which are listed here.
I picked a few that I had been wanting to play with: {bbplot} for theme, {ggalt} for dumbbell plots, {patchwork} to arrange plots, and {ggtext} for HTML text styling.
I started by setting up a custom theme for my plots -- largely borrowed from the BBC's {bbplot} package.
The preset theme can be applied directly as a ggplot
layer using bbplot::bbc_style()
, but I made some tweaks and saved it as my_theme
.
my_colors <- c("#FAAB18", "#1380A1")
my_theme <- theme_light() +
theme(axis.ticks = element_blank(),
axis.line = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "#cbcbcb"),
panel.grid.major.x = element_blank(),
panel.background = element_blank(),
panel.border = element_blank())
theme_set(my_theme)
Now I create each bar plot (for Gender, Ethnicity, and Race) separately.
- Reorder gender by percent
- Set y axis 0-100 so all plots have the same range
- Use custom colors (from
bbplot::bbc_style
) - Use
color
in bars (in addition tofill
) so 0 shows as a line
p1 <- ethnicity %>%
mutate(ethnicity = fct_reorder(ethnicity, percent, na.rm = T)) %>%
ggplot(aes(x = ethnicity, y = percent, fill = personnel, color = personnel)) +
geom_col(position = "dodge") +
coord_flip(ylim = c(0, 100)) +
ggtitle("Ethnicity") +
xlab(NULL) +
ylab(NULL) +
scale_fill_manual(values = my_colors) +
scale_color_manual(values = my_colors)
p2 <- gender %>%
mutate(gender = fct_reorder(gender, percent)) %>%
ggplot(aes(x = gender, y = percent, fill = personnel, color = personnel)) +
geom_col(position = "dodge") +
coord_flip(ylim = c(0, 100)) +
ggtitle("Gender") +
xlab(NULL) +
ylab(NULL) +
scale_fill_manual(values = my_colors) +
scale_color_manual(values = my_colors)
p3 <- race %>%
mutate(race = fct_reorder(race, percent, na.rm = T)) %>%
ggplot(aes(x = race, y = percent, fill = personnel, color = personnel)) +
geom_col(position = "dodge") +
coord_flip(ylim = c(0, 100)) +
ggtitle("Race") +
xlab(NULL) +
ylab(NULL) +
scale_fill_manual(values = my_colors) +
scale_color_manual(values = my_colors)
Then I use {patchwork} to stitch them together, and {ggtext} to add color to the title in lieu of a legend.
- "Collect" guides so legends from each plot are treated the same (ie. deleted)
- Use {ggtext}
element_textbox_simple
orelement_markdown
to allow html in title
p3 + (p1 / p2) +
plot_layout(guides = "collect") +
plot_annotation(title = "<span style='font-size:18pt'>Diversity in Department <b style='color:#FAAB18;'>Leadership</b> vs <b style='color:#1380A1;'>Overall</b></span>",
subtitle = "Percentages of personnel in each category are shown",
theme = theme(plot.title = element_markdown(lineheight = 1.1))) &
theme(legend.position = "none")
I also repeated the above, but with Race shown in a dumbbell plot:
p4 <- race %>%
mutate(race = fct_reorder(race, percent, na.rm = T)) %>%
pivot_wider(names_from = personnel, values_from = percent) %>%
ggplot() +
geom_dumbbell(aes(x = `Dept Overall`, xend = `Dept Leadership`, y = race),
size = 3,
colour = "#dddddd",
colour_x = "#1380A1",
colour_xend = "#FAAB18",
show.legend = F) +
coord_cartesian(xlim = c(0,100)) +
ggtitle("Race") +
xlab(NULL) +
ylab(NULL)
And here's the patchwork:
p4 + (p1 / p2) +
plot_layout(guides = "collect") +
plot_annotation(title = "<span style='font-size:18pt'>Diversity in Dept <b style='color:#FAAB18;'>Leadership</b> vs <b style='color:#1380A1;'>Overall</b></span>",
subtitle = "Percentages of personnel in each category are shown",
theme = theme(plot.title = element_markdown(lineheight = 1.1))) &
theme(legend.position = "none")
Thanks to this helpful post, I discovered that you can use xaringan's Infinite Moon Reader to get live previews of RMarkdown documents (not just xaringan slides!).
After installing {xaringan}, you can either run xaringan:::inf_mr()
or select "Infinite Moon Reader" from the RStudio Addins drop-down menu.
The preview will appear in the RStudio Viewer pane, and it will refresh every time you save changes to your Rmd. So much better than knitting every time you want to check your formatting!