Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full review post-trial 01 #53

Merged
merged 10 commits into from
May 4, 2024
10 changes: 10 additions & 0 deletions episodes/delays-functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,16 @@ library(tidyverse)
withr::local_options(list(mc.cores = 4))
```

::::::::::::::::::: checklist

### The double-colon

The double-colon `::` in R is used to access functions or objects from a specific package without loading the entire package into the current environment. This allows for a more targeted approach to using package components and helps avoid namespace conflicts.

`::` lets you call a specific function from a package by explicitly mentioning the package name. For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package without loading the entire package.

:::::::::::::::::::

## Distribution functions

In R, all the statistical distributions have functions to access the following:
Expand Down
10 changes: 10 additions & 0 deletions episodes/delays-reuse.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

Infectious diseases follow an infection cycle, which usually includes the following phases: presymptomatic period, symptomatic period and recovery period, as described by their [natural history](../learners/reference.md#naturalhistory). These time periods can be used to understand transmission dynamics and inform disease prevention and control interventions.

![Definition of key time periods. From [Xiang et al, 2021](https://www.sciencedirect.com/science/article/pii/S2468042721000038)](fig/time-periods.jpg)

Check warning on line 38 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/time-periods.jpg


::::::::::::::::: callout
Expand Down Expand Up @@ -99,12 +99,12 @@

The generation time, jointly with the reproduction number ($R$), provide valuable insights on the strength of transmission and inform the implementation of control measures. Given a $R>1$, the shorter the generation time, the earlier the incidence of disease cases will grow.

![Video from the MRC Centre for Global Infectious Disease Analysis, Ep 76. Science In Context - Epi Parameter Review Group with Dr Anne Cori (27-07-2023) at <https://youtu.be/VvpYHhFDIjI?si=XiUyjmSV1gKNdrrL>](fig/reproduction-generation-time.png)

Check warning on line 102 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/reproduction-generation-time.png

In calculating the effective reproduction number ($R_{t}$), the *generation time* distribution is often approximated by the [serial interval](../learners/reference.md#serialinterval) distribution.
This frequent approximation is because it is easier to observe and measure the onset of symptoms than the onset of infectiousness.

![A schematic of the relationship of different time periods of transmission between an infector and an infectee in a transmission pair. Exposure window is defined as the time interval having viral exposure, and transmission window is defined as the time interval for onward transmission with respect to the infection time ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)).](fig/serial-interval-observed.jpeg)

Check warning on line 107 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-observed.jpeg

However, using the *serial interval* as an approximation of the *generation time* is primarily valid for diseases in which infectiousness starts after symptom onset ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)). In cases where infectiousness starts before symptom onset, the serial intervals can have negative values, which is the case for diseases with pre-symptomatic transmission ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext#gr2)).

Expand All @@ -116,13 +116,13 @@

When we calculate the *serial interval*, we see that not all case pairs have the same time length. We will observe this variability for any case pair and individual time period, including the [incubation period](../learners/reference.md#incubation) and [infectious period](../learners/reference.md#infectiousness).

![Serial intervals of possible case pairs in (a) COVID-19 and (b) MERS-CoV. Pairs represent a presumed infector and their presumed infectee plotted by date of symptom onset ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig6)).](fig/serial-interval-pairs.jpg)

Check warning on line 119 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-pairs.jpg

To summarise these data from individual and pair time periods, we can find the **statistical distributions** that best fit the data ([McFarland et al., 2023](https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.27.2200806)).

<!-- add a reference about good practices to estimate distributions -->

![Fitted serial interval distribution for (a) COVID-19 and (b) MERS-CoV based on reported transmission pairs in Saudi Arabia. We fitted three commonly used distributions, Lognormal, Gamma, and Weibull distributions, respectively ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig5)).](fig/seria-interval-fitted-distributions.jpg)

Check warning on line 125 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/seria-interval-fitted-distributions.jpg

Statistical distributions are summarised in terms of their **summary statistics** like the *location* (mean and percentiles) and *spread* (variance or standard deviation) of the distribution, or with their **distribution parameters** that inform about the *form* (shape and rate/scale) of the distribution. These estimated values can be reported with their **uncertainty** (95% confidence intervals).

Expand Down Expand Up @@ -156,7 +156,7 @@
- Which one would be harder to control?
- Why do you conclude that?

![Serial interval of novel coronavirus (COVID-19) infections overlaid with a published distribution of SARS. ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext))](fig/serial-interval-covid-sars.jpg)

Check warning on line 159 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-covid-sars.jpg

::::::::::::::::: hint

Expand Down Expand Up @@ -196,6 +196,16 @@
)
```

::::::::::::::::::: checklist

### The double-colon

The double-colon `::` in R is used to access functions or objects from a specific package without loading the entire package into the current environment. This allows for a more targeted approach to using package components and helps avoid namespace conflicts.

`::` lets you call a specific function from a package by explicitly mentioning the package name. For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package without loading the entire package.

:::::::::::::::::::

From the `{epiparameter}` package, we can use the `epidist_db()` function to ask for any `disease` and also for a specific epidemiological distribution (`epi_dist`).

Let's ask now how many parameters we have in the epidemiological distributions database (`epidist_db`) with the generation time using the string `generation`:
Expand Down
125 changes: 54 additions & 71 deletions episodes/quantify-transmissibility.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,6 @@ teaching: 30
exercises: 0
---

```{r setup, echo = FALSE, warning = FALSE, message = FALSE}
library(EpiNow2)
library(ggplot2)
withr::local_options(list(mc.cores = 4))
```

:::::::::::::::::::::::::::::::::::::: questions

- How can I estimate the time-varying reproduction number ($Rt$) and growth rate from a time series of case data?
Expand Down Expand Up @@ -56,6 +50,29 @@ To estimate these key metrics using case data we must account for delays between

In the next tutorials we will focus on how to use the functions in `{EpiNow2}` to estimate transmission metrics of case data. We will not cover the theoretical background of the models or inference framework, for details on these concepts see the [vignette](https://epiforecasts.io/EpiNow2/dev/articles/estimate_infections.html).

In this tutorial we are going to learn how to use the `{EpiNow2}` package to estimate the time-varying reproduction number. We’ll use the `{dplyr}` package to arrange some of its inputs, `{ggplot2}` to visualize case distribution, and the pipe `%>%` to connect some of their functions, so let’s also call to the `{tidyverse}` package:

```{r,message=FALSE,warning=FALSE}
library(EpiNow2)
library(tidyverse)
```

::::::::::::::::::: checklist

### The double-colon

The double-colon `::` in R is used to access functions or objects from a specific package without loading the entire package into the current environment. This allows for a more targeted approach to using package components and helps avoid namespace conflicts.

`::` lets you call a specific function from a package by explicitly mentioning the package name. For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package without loading the entire package.

:::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

This tutorial illustrates the usage of `epinow()` to estimate the time-varying reproduction number and infection times. Learners should understand the necessary inputs to the model and the limitations of the model output.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


::::::::::::::::::::::::::::::::::::: callout
### Bayesian inference
Expand All @@ -78,19 +95,6 @@ In the ["`Expected change in daily cases`" callout](#expected-change-in-daily-ca
::::::::::::::::::::::::::::::::::::::::::::::::


The first step is to load the `{EpiNow2}` package:

```{r, eval = FALSE}
library(EpiNow2)
```

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

This tutorial illustrates the usage of `epinow()` to estimate the time-varying reproduction number and infection times. Learners should understand the necessary inputs to the model and the limitations of the model output.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


## Delay distributions and case data
### Case data

Expand Down Expand Up @@ -303,7 +307,7 @@ The function `epinow()` is a wrapper for the function `estimate_infections()` us
There are numerous other inputs that can be passed to `epinow()`, see `EpiNow2::?epinow()` for more detail.
One optional input is to specify a log normal prior for the effective reproduction number $R_t$ at the start of the outbreak. We specify a mean and standard deviation as arguments of `prior` within `rt_opts()`:

```{r, eval = FALSE}
```{r, eval = TRUE}
rt_log_mean <- convert_to_logmean(2, 1)
rt_log_sd <- convert_to_logsd(2, 1)
rt <- rt_opts(prior = list(mean = rt_log_mean, sd = rt_log_sd))
Expand All @@ -324,72 +328,52 @@ To find the maximum number of available cores on your machine, use `parallel::de

::::::::::::::::::::::::::::::::::::::::::::::::

```{r, echo = FALSE}
rt_log_mean <- convert_to_logmean(2, 1)
rt_log_sd <- convert_to_logsd(2, 1)
::::::::::::::::::::::::: checklist

incubation_period_fixed <- dist_spec(
mean = 4, sd = 2,
max = 20, distribution = "gamma"
**Note:** In the code below `_fixed` distributions are used instead of `_variable` (delay distributions with uncertainty). This is to speed up computation time. It is generally recommended to use variable distributions that account for additional uncertainty.

```{r, echo = TRUE}
# fixed alternatives
generation_time_fixed <- dist_spec(
mean = 3.6, sd = 3.1,
max = 20, distribution = "lognormal"
)

log_mean <- convert_to_logmean(2, 1)
log_sd <- convert_to_logsd(2, 1)
reporting_delay_fixed <- dist_spec(
mean = log_mean, sd = log_sd,
max = 10, distribution = "lognormal"
)

generation_time_fixed <- dist_spec(
mean = 3.6, sd = 3.1,
max = 20, distribution = "lognormal"
)
```

*Note: in the code below fixed distributions are used instead of variable. This is to speed up computation time. It is generally recommended to use variable distributions that account for additional uncertainty.*

::::::::::::::::::::::::::::::::: spoiler

### On reducing computation time

Using an appropriate number of samples and chains is crucial for ensuring convergence and obtaining reliable estimates in Bayesian computations using Stan. Inadequate sampling or insufficient chains may lead to issues such as divergent transitions, impacting the accuracy and stability of the inference process.
:::::::::::::::::::::::::

For the purpose of this tutorial, we can add more configuration details to get an useful output in less time. You can specify a fixed number of `samples` and `chains` to the `stan` argument using the `stan_opts()` function:
Now you are ready to run `EpiNow2::epinow()` to estimate the time-varying reproduction number:

The code in the proposed code chunk can take around 10 minutes. We expect this alternative code chunk below using `stan_opts()` to take approximately 3 minutes:
```{r, message = FALSE, eval = TRUE}
reported_cases <- cases[1:90, ]

```{r,eval=FALSE}
estimates <- epinow(
# same code as previous chunk
# cases
reported_cases = reported_cases,
# delays
generation_time = generation_time_opts(generation_time_fixed),
delays = delay_opts(
incubation_period_fixed + reporting_delay_fixed
),
rt = rt_opts(
prior = list(mean = rt_log_mean, sd = rt_log_sd)
),
# [new] set a fixed number of samples and chains
delays = delay_opts(incubation_period_fixed + reporting_delay_fixed),
# prior
rt = rt_opts(prior = list(mean = rt_log_mean, sd = rt_log_sd)),
# computation (optional)
stan = stan_opts(samples = 1000, chains = 3)
)
```

:::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::: callout

```{r, message = FALSE, eval = TRUE}
reported_cases <- cases[1:90, ]
### Do not wait for this to continue

estimates <- epinow(
reported_cases = reported_cases,
generation_time = generation_time_opts(generation_time_fixed),
delays = delay_opts(
incubation_period_fixed + reporting_delay_fixed
),
rt = rt_opts(
prior = list(mean = rt_log_mean, sd = rt_log_sd)
)
)
```
Using `stan = stan_opts()` is optional. For the purpose of this tutorial on reducing computation time, we specified a fixed number of `samples = 1000` and `chains = 3` to the `stan` argument using the `stan_opts()` function. We expect this to take approximately 3 minutes.

**Remember:** Using an appropriate number of *samples* and *chains* is crucial for ensuring convergence and obtaining reliable estimates in Bayesian computations using Stan. Inadequate sampling or insufficient chains may lead to issues such as divergent transitions, impacting the accuracy and stability of the inference process.

:::::::::::::::::::::::::::::::::

### Results

Expand Down Expand Up @@ -466,14 +450,13 @@ To find regional estimates, we use the same inputs as `epinow()` to the function

```{r, message = FALSE, eval = TRUE}
estimates_regional <- regional_epinow(
# cases
reported_cases = regional_cases,
# delays
generation_time = generation_time_opts(generation_time_fixed),
delays = delay_opts(
incubation_period_fixed + reporting_delay_fixed
),
rt = rt_opts(
prior = list(mean = rt_log_mean, sd = rt_log_sd)
)
delays = delay_opts(incubation_period_fixed + reporting_delay_fixed),
# prior
rt = rt_opts(prior = list(mean = rt_log_mean, sd = rt_log_sd))
)

estimates_regional$summary$summarised_results$table
Expand Down
86 changes: 85 additions & 1 deletion learners/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,95 @@ new_packages <- c(
"tidyverse"
)

pak::pak(new_packages)
pak::pkg_install(new_packages)
```

These installation steps could ask you `? Do you want to continue (Y/n)` write `Y` and press <kbd>Enter</kbd>.

::::::::::::::::::::::::::::: spoiler

### do you get an error with EpiNow2?

Windows users will need a working installation of `Rtools` in order to build the package from source. `Rtools` is not an R package, but a software you need to download and install. We suggest you to follow:

<!-- reference [these steps](http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/#1) -->

1. **Verify `Rtools` installation**. You can do so by using Windows search across your system. Optionally, you can use `{devtools}` running:

```r
if(!require("devtools")) install.packages("devtools")
devtools::find_rtools()
```

If the result is `FALSE`, then you should do step 2.

2. **Install `Rtools`**. Download the `Rtools` installer from <https://cran.r-project.org/bin/windows/Rtools/>. Install with default selections.

3. **Verify `Rtools` installation**. Again, we can use `{devtools}`:

```r
if(!require("devtools")) install.packages("devtools")
devtools::find_rtools()
```

:::::::::::::::::::::::::::::


::::::::::::::::::::::::::::: spoiler

### do you get an error with epiverse-trace packages?

If you get an error message when installing {epiparameter}, try this alternative code:

```r
# for epiparameter
install.packages("epiparameter", repos = c("https://epiverse-trace.r-universe.dev"))
```

:::::::::::::::::::::::::::::

::::::::::::::::::::::::::: spoiler

### What to do if an Error persist?

If the error message keyword include an string like `Personal access token (PAT)`, you may need to [set up your GitHub token](https://epiverse-trace.github.io/git-rstudio-basics/02-setup.html#set-up-your-github-token).

First, install these R packages:

```r
if(!require("pak")) install.packages("pak")

new <- c("gh",
"gitcreds",
"usethis")

pak::pak(new)
```

Then, follow these three steps to [set up your GitHub token (read this step-by-step guide)](https://epiverse-trace.github.io/git-rstudio-basics/02-setup.html#set-up-your-github-token):

```r
# Generate a token
usethis::create_github_token()

# Configure your token
gitcreds::gitcreds_set()

# Get a situational report
usethis::git_sitrep()
```

Try again installing {epiparameter}:

```r
if(!require("remotes")) install.packages("remotes")
remotes::install_github("epiverse-trace/epiparameter")
```

If the error persist, [contact us](#your-questions)!

:::::::::::::::::::::::::::

You should update **all of the packages** required for the tutorial, even if you installed them relatively recently. New versions bring improvements and important bug fixes.

When the installation has finished, you can try to load the packages by pasting the following code into the console:
Expand Down
Loading