Skip to content

Commit

Permalink
Final edits flagged
Browse files Browse the repository at this point in the history
  • Loading branch information
arranhamlet committed Oct 30, 2024
1 parent e6a4aa9 commit 136a8a8
Show file tree
Hide file tree
Showing 11 changed files with 213 additions and 192 deletions.
4 changes: 2 additions & 2 deletions html_outputs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,7 @@ <h1 class="title">The Epidemiologist R Handbook</h1>
<div>
<div class="quarto-title-meta-heading">Last updated</div>
<div class="quarto-title-meta-contents">
<p class="date">Oct 18, 2024</p>
<p class="date">Oct 30, 2024</p>
</div>
</div>

Expand Down Expand Up @@ -1463,7 +1463,7 @@ <h3 class="unnumbered anchored" data-anchor-id="contribution">Contribution</h3>
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","closeEffect":"zoom","descPosition":"bottom","openEffect":"zoom","loop":false});
<script>var lightboxQuarto = GLightbox({"openEffect":"zoom","loop":false,"closeEffect":"zoom","descPosition":"bottom","selector":".lightbox"});
(function() {
let previousOnload = window.onload;
window.onload = () => {
Expand Down
146 changes: 77 additions & 69 deletions html_outputs/new_pages/ggplot_tips.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions html_outputs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -2815,7 +2815,7 @@
"href": "new_pages/ggplot_tips.html#preparation",
"title": "31  ggplot tips",
"section": "",
"text": "Load packages\nThis code chunk shows the loading of packages required for the analyses. In this handbook we emphasize p_load() from pacman, which installs the package if necessary and loads it for use. You can also load installed packages with library() from base R. See the page on R basics for more information on R packages.\n\npacman::p_load(\n tidyverse, # includes ggplot2 and other\n rio, # import/export\n here, # file locator\n stringr, # working with characters \n scales, # transform numbers\n ggrepel, # smartly-placed labels\n gghighlight, # highlight one part of plot\n RColorBrewer # color scales\n)\n\n\n\nImport data\nFor this page, we import the dataset of cases from a simulated Ebola epidemic. If you want to follow along, click to download the “clean” linelist (as .rds file). Import data with the import() function from the rio package (it handles many file types like .xlsx, .csv, .rds - see the Import and export page for details).\n\nlinelist &lt;- rio::import(\"linelist_cleaned.rds\")\n\nThe first 50 rows of the linelist are displayed below.",
"text": "Load packages\nThis code chunk shows the loading of packages required for the analyses. In this handbook we emphasize p_load() from pacman, which installs the package if necessary and loads it for use. You can also load installed packages with library() from base R. See the page on R basics for more information on R packages.\n\npacman::p_load(\n tidyverse, # includes ggplot2 and other\n rio, # import/export\n here, # file locator\n stringr, # working with characters \n scales, # transform numbers\n cowplot, # for dual axes\n ggrepel, # smartly-placed labels\n gghighlight, # highlight one part of plot\n RColorBrewer # color scales\n)\n\n\n\nImport data\nFor this page, we import the dataset of cases from a simulated Ebola epidemic. If you want to follow along, click to download the “clean” linelist (as .rds file). Import data with the import() function from the rio package (it handles many file types like .xlsx, .csv, .rds - see the Import and export page for details).\n\nlinelist &lt;- rio::import(\"linelist_cleaned.rds\")\n\nThe first 50 rows of the linelist are displayed below.",
"crumbs": [
"Data Visualization",
"<span class='chapter-number'>31</span>  <span class='chapter-title'>ggplot tips</span>"
Expand Down Expand Up @@ -2925,7 +2925,7 @@
"href": "new_pages/ggplot_tips.html#dual-axes",
"title": "31  ggplot tips",
"section": "31.11 Dual axes",
"text": "31.11 Dual axes\nA secondary y-axis is often a requested addition to a ggplot2 graph. While there is a robust debate about the validity of such graphs in the data visualization community, and they are often not recommended, your manager may still want them. Below, we present one method to achieve them.\nThis approach involves creating two separate datasets, one for each of the different plots we want to achieve, and then calculating a “scaling factor” required to transform the values onto the same scale.\nThis is because the function we are going to use to add a second y-axis, sec_axis() requires the second axis be directly proportional to the first axis.\nTo demonstrate this technique we will overlay the epidemic curve with a line of the weekly percent of patients who died. We use this example because the alignment of dates on the x-axis is more complex than say, aligning a bar chart with another plot. Some things to note:\n\nThe epicurve and the line are aggregated into weeks prior to plotting and the date_breaks and date_labels are identical - we do this so that the x-axes of the two plots are the same when they are overlaid.\n\nThe y-axis is created to the right-side for plot 2 with the sec_axis = argument of scale_y_continuous().\n\nMake the datasets for the plot\nHere we will transform linelist into two different datasets linelist_primary_axis and linelist_secondary_axis in order to then create the scaling factor that will allow us to attach a second axis at the correct scale.\n\n#Set up linelist for primary axis - the weekly cases epicurve\nlinelist_primary_axis &lt;- linelist %&gt;% \n count(epiweek = lubridate::floor_date(date_onset, \"week\"))\n\n#Set up linelist for secondary axis - the line graph of the weekly percent of deaths\nlinelist_secondary_axis &lt;- linelist %&gt;% \n group_by(\n epiweek = lubridate::floor_date(date_onset, \"week\")) %&gt;% \n summarise(\n n = n(),\n pct_death = 100*sum(outcome == \"Death\", na.rm = T) / n)\n\nCalculate the scaling factor\nNow that we have created the datasets with our variables of interest, we want to extract the columns and calculate the maximum value in each in order to set our scale. We will then divide the secondary axis value by the first axis value in order to create our scaling factor.\n\n#Set up scaling factor to transform secondary axis\nlinelist_primary_axis_max &lt;- linelist_primary_axis %&gt;%\n pull(n) %&gt;%\n max()\n\nlinelist_secondary_axis_max &lt;- linelist_secondary_axis %&gt;%\n pull(pct_death) %&gt;%\n max()\n\n#Create our scaling factor, how much the secondary axis value must be divided by to create values on the same scale as the primary axis\nscaling_factor &lt;- linelist_secondary_axis_max/linelist_primary_axis_max\n\nAnd now we are ready to plot! We will be using the argument geom_histogram() to create our epicurve, and geom_line() to create our line graph. Note that we are not specifying a data = argument in our first ggplot(), this is because we are using two separate datasets to create this plot.\n\nggplot() +\n #First create the epicurve\n geom_histogram(data = linelist_primary_axis,\n mapping = aes(x = epiweek, \n y = n), \n fill = \"grey\",\n stat = \"identity\"\n ) +\n #Now create the linegraph\n geom_line(data = linelist_secondary_axis,\n mapping = aes(x = epiweek, \n y = pct_death / scaling_factor)\n ) +\n #Now we specify the second axis, and note that we are going to be multiplying the values of the second axis by the scaling factor in order to get the axis to display the correct values\n scale_y_continuous(\n sec.axis = sec_axis(~.*scaling_factor, \n name = \"Weekly percent of deaths\")\n ) +\n scale_x_date(\n date_breaks = \"month\",\n date_labels = \"%b\"\n ) +\n labs(\n x = \"Epiweek of symptom onset\",\n y = \"Weekly cases\",\n title = \"Weekly case incidence and percent deaths\"\n ) +\n theme_bw()",
"text": "31.11 Dual axes\nA secondary y-axis is often a requested addition to a ggplot2 graph. While there is a robust debate about the validity of such graphs in the data visualization community, and they are often not recommended, your manager may still want them. Below, we present one method to achieve them: using the cowplot package to combine two separate plots.\nThis approach involves creating two separate plots - one with a y-axis on the left, and the other with y-axis on the right. Both will use a specific theme_cowplot() and must have the same x-axis. Then in a third command the two plots are aligned and overlaid on top of each other. The functionalities of cowplot, of which this is only one, are described in depth at this site.\nTo demonstrate this technique we will overlay the epidemic curve with a line of the weekly percent of patients who died. We use this example because the alignment of dates on the x-axis is more complex than say, aligning a bar chart with another plot. Some things to note:\n\nThe epicurve and the line are aggregated into weeks prior to plotting and the date_breaks and date_labels are identical - we do this so that the x-axes of the two plots are the same when they are overlaid.\n\nThe y-axis is moved to the right-side for plot 2 with the position = argument of scale_y_continuous().\n\nBoth plots make use of theme_cowplot()\n\nNote there is another example of this technique in the Epidemic curves page - overlaying cumulative incidence on top of the epicurve.\nMake plot 1\nThis is essentially the epicurve. We use geom_area() just to demonstrate its use (area under a line, by default)\n\npacman::p_load(cowplot) # load/install cowplot\n\np1 &lt;- linelist %&gt;% # save plot as object\n count(\n epiweek = lubridate::floor_date(date_onset, \"week\")) %&gt;% \n ggplot()+\n geom_area(aes(x = epiweek, y = n), fill = \"grey\")+\n scale_x_date(\n date_breaks = \"month\",\n date_labels = \"%b\")+\n theme_cowplot()+\n labs(\n y = \"Weekly cases\"\n )\n\np1 # view plot \n\n\n\n\n\n\n\n\nMake plot 2\nCreate the second plot showing a line of the weekly percent of cases who died.\n\np2 &lt;- linelist %&gt;% # save plot as object\n group_by(\n epiweek = lubridate::floor_date(date_onset, \"week\")) %&gt;% \n summarise(\n n = n(),\n pct_death = 100*sum(outcome == \"Death\", na.rm=T) / n) %&gt;% \n ggplot(aes(x = epiweek, y = pct_death))+\n geom_line()+\n scale_x_date(\n date_breaks = \"month\",\n date_labels = \"%b\")+\n scale_y_continuous(\n position = \"right\")+\n theme_cowplot()+\n labs(\n x = \"Epiweek of symptom onset\",\n y = \"Weekly percent of deaths\",\n title = \"Weekly case incidence and percent deaths\"\n )\n\np2 # view plot\n\n\n\n\n\n\n\n\nNow we align the plot using the function align_plots(), specifying horizontal and vertical alignment (“hv”, could also be “h”, “v”, “none”). We specify alignment of all axes as well (top, bottom, left, and right) with “tblr”. The output is of class list (2 elements).\nThen we draw the two plots together using ggdraw() (from cowplot) and referencing the two parts of the aligned_plots object.\n\naligned_plots &lt;- cowplot::align_plots(p1, p2, align=\"hv\", axis=\"tblr\") # align the two plots and save them as list\naligned_plotted &lt;- ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]]) # overlay them and save the visual plot\naligned_plotted # print the overlayed plots",
"crumbs": [
"Data Visualization",
"<span class='chapter-number'>31</span>  <span class='chapter-title'>ggplot tips</span>"
Expand Down
131 changes: 78 additions & 53 deletions new_pages/epicurves.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ pacman::p_load(
i2extras, # supplement to incidence2
stringr, # search and manipulate character strings
forcats, # working with factors
cowplot, # for dual axes
RColorBrewer, # Color palettes from colorbrewer2.org
tidyverse # data management + ggplot2 graphics
)
Expand Down Expand Up @@ -1429,67 +1430,89 @@ The above technique for faceting was adapted from [this](https://stackoverflow.c
<!-- ======================================================= -->
## Dual-axis { }

Although there are fierce discussions about the validity of dual axes within the data visualization community, many epi supervisors still want to see an epicurve or similar chart with a percent overlaid with a second axis. This is discussed more extensively in the [ggplot tips](ggplot_tips.qmd) page, but one example using the argument `sec_axis = ` of the function `scale_y_continuous()`.
Although there are fierce discussions about the validity of dual axes within the data visualization community, many epi supervisors still want to see an epicurve or similar chart with a percent overlaid with a second axis. This is discussed more extensively in the [ggplot tips](ggplot_tips.qmd) page, but one example using the **cowplot** method is shown below:

This walkthrough can be found in more detail in [ggplot tips](ggplot_tips.qmd).
* Two distinct plots are made, and then combined with **cowplot** package.
* The plots must have the exact same x-axis (set limits) or else the data and labels will not align
* Each uses `theme_cowplot()` and one has the y-axis moved to the right side of the plot

```{r, warning=F, message=F}
#load package
pacman::p_load(cowplot)
#Set up linelist for primary axis - the weekly cases epicurve
linelist_primary_axis <- linelist %>%
count(epiweek = lubridate::floor_date(date_onset, "week"))
#Set up linelist for secondary axis - the line graph of the weekly percent of deaths
linelist_secondary_axis <- linelist %>%
group_by(
epiweek = lubridate::floor_date(date_onset, "week")) %>%
summarise(
n = n(),
pct_death = 100*sum(outcome == "Death", na.rm = T) / n)
#Set up scaling factor to transform secondary axis
linelist_primary_axis_max <- linelist_primary_axis %>%
pull(n) %>%
max()
linelist_secondary_axis_max <- linelist_secondary_axis %>%
pull(pct_death) %>%
max()
#Create our scaling factor, how much the secondary axis value must be divided by to create values on the same scale as the primary axis
scaling_factor <- linelist_secondary_axis_max/linelist_primary_axis_max
ggplot() +
#First create the epicurve
geom_area(data = linelist_primary_axis,
mapping = aes(x = epiweek,
y = n),
fill = "grey"
) +
#Now create the linegraph
geom_line(data = linelist_secondary_axis,
mapping = aes(x = epiweek,
y = pct_death / scaling_factor)
) +
#Now we specify the second axis, and note that we are going to be multiplying the values of the second axis by the scaling factor in order to get the axis to display the correct values
scale_y_continuous(
sec.axis = sec_axis(~.*scaling_factor,
name = "Weekly percent of deaths")
) +
scale_x_date(
date_breaks = "month",
date_labels = "%b"
) +
labs(
x = "Epiweek of symptom onset",
y = "Weekly cases",
title = "Weekly case incidence and percent deaths"
) +
theme_bw()
# Make first plot of epicurve histogram
#######################################
plot_cases <- linelist %>%
# plot cases per week
ggplot()+
# create histogram
geom_histogram(
mapping = aes(x = date_onset),
# bin breaks every week beginning monday before first case, going to monday after last case
breaks = weekly_breaks_all)+ # pre-defined vector of weekly dates (see top of ggplot section)
# specify beginning and end of date axis to align with other plot
scale_x_date(
limits = c(min(weekly_breaks_all), max(weekly_breaks_all)))+ # min/max of the pre-defined weekly breaks of histogram
# labels
labs(
y = "Daily cases",
x = "Date of symptom onset"
)+
theme_cowplot()
# make second plot of percent died per week
###########################################
plot_deaths <- linelist %>% # begin with linelist
group_by(week = floor_date(date_onset, "week")) %>% # create week column
# summarise to get weekly percent of cases who died
summarise(n_cases = n(),
died = sum(outcome == "Death", na.rm=T),
pct_died = 100*died/n_cases) %>%
# begin plot
ggplot()+
# line of weekly percent who died
geom_line( # create line of percent died
mapping = aes(x = week, y = pct_died), # specify y-height as pct_died column
stat = "identity", # set line height to the value in pct_death column, not the number of rows (which is default)
size = 2,
color = "black")+
# Same date-axis limits as the other plot - perfect alignment
scale_x_date(
limits = c(min(weekly_breaks_all), max(weekly_breaks_all)))+ # min/max of the pre-defined weekly breaks of histogram
# y-axis adjustments
scale_y_continuous( # adjust y-axis
breaks = seq(0,100, 10), # set break intervals of percent axis
limits = c(0, 100), # set extent of percent axis
position = "right")+ # move percent axis to the right
# Y-axis label, no x-axis label
labs(x = "",
y = "Percent deceased")+ # percent axis label
theme_cowplot() # add this to make the two plots merge together nicely
```

Now use **cowplot** to overlay the two plots. Attention has been paid to the x-axis alignment, side of the y-axis, and use of `theme_cowplot()`.
```{r, warning=F, message=F}
aligned_plots <- cowplot::align_plots(plot_cases, plot_deaths, align="hv", axis="tblr")
ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]])
```



## Cumulative Incidence {}

If beginning with a case linelist, create a new column containing the cumulative number of cases per day in an outbreak using `cumsum()` from **base** R:
Expand Down Expand Up @@ -1531,6 +1554,8 @@ It can also be overlaid onto the epicurve, with dual-axis using the method descr

Below we demonstrate how to make epicurves using the **incidence2** package. The authors of this package have tried to allow the user to create and modify epicurves without needing to know **ggplot2** syntax. Much of this page is adapted from the package vignettes, which can be found at the **incidence2** [github page](https://www.reconverse.org/incidence2/articles/incidence2.html).

While **incidence2** can be very useful for quickly generating figures, the package is much less flexible than approaches described using **ggplot2**. While this may not be an issue for some users, those who want to have a greater control in creating and adapting their figures, should use **ggplot2**.

To create an epicurve with **incidence2** you need to have a column with a date value (it does not need to be of the class `Date`, but it should have a numeric or logical order to it (i.e. "Week1", "Week2", etc)) and a column with a count variable (what is being counted). It should also *not* have any duplicated rows.

To create this, we can use the function `incidence()` which will summarise our data in a format that can be used to create epicurves. There are a number of different arguments to `incidence()`, type `?incidence` in your R console to learn more.
Expand Down
4 changes: 1 addition & 3 deletions new_pages/errors.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,7 @@ You may have supplied too few values for the color scale, make sure you have the
```
Can't add x object
```

This is likely a ggplot error from the use of `scale_fill_manual()`, where you have not provided enough colors for the number of unique values. If the column is class factor, consider whether `NA` is a factor level.

You probably have an extra `+` at the end of a ggplot command that you need to delete.

### R Markdown errors {.unnumbered}

Expand Down
Loading

0 comments on commit 136a8a8

Please sign in to comment.