Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorporate feedback pretrial #50

Merged
merged 19 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
carpentry: 'incubator'

# Overall title for pages.
title: 'Reading and cleaning data for outbreak analytics with R'
title: 'Accessing and using delays to Quantify transmission'

# Date the lesson was created (YYYY-MM-DD, this is empty by default)
created:
Expand Down
50 changes: 22 additions & 28 deletions episodes/delays-reuse.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

:::::::::::::::::::::::::::::::::::::: questions

- How to get easy access to delay distributions from a literature search database?
- How to get access to disease delay distributions from a pre-established database for use in analysis?

::::::::::::::::::::::::::::::::::::::::::::::::

Expand All @@ -27,15 +27,15 @@

**Data science** : Basic programming with R.

**Epidemic theory** : Epidemiological parameters, time periods.
**Epidemic theory** : epidemiological parameters, disease time periods, such as the incubation period, generation time, and serial interval.

:::::::::::::::::::::::::::::::::

## Introduction

The [natural history](../learners/reference.md#naturalhistory) of an infectious disease shows that its development has a regularity from stage to stage. The time periods from an infectious disease inform about the timing of transmission and interventions.
Infectious diseases follow an infection cycle, which usually includes the following phases: presymptomatic period, symptomatic period and recovery period, as described by their [natural history](../learners/reference.md#naturalhistory). These time periods can be used to understand transmission dynamics and inform disease prevention and control interventions.

![Definition of key time periods. From [Xiang et al, 2021](https://www.sciencedirect.com/science/article/pii/S2468042721000038)](fig/time-periods.jpg)

Check warning on line 38 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/time-periods.jpg


::::::::::::::::: callout
Expand All @@ -50,7 +50,7 @@

<!-- Early models for COVID-19 used parameters from other coronaviruses. https://www.thelancet.com/article/S1473-3099(20)30144-4/fulltext -->

To exemplify how to use `{epiparameter}` in your analysis pipeline, our goal in this episode will be to *choose* one specific set of epidemiological parameters from the literature, instead of *copying-and-pasting* them by hand, to plug them into an `{EpiNow2}` analysis workflow.
To exemplify how to use the `{epiparameter}` R package in your analysis pipeline, our goal in this episode will be to choose one specific set of epidemiological parameters from the literature, instead of copying-and-pasting them by hand, to plug them into an `{EpiNow2}` analysis workflow.

<!-- In this episode, we'll learn how to choose one specific set of epidemiological parameters from the literature and then get their **summary statistics** using `{epiparameter}`. -->

Expand All @@ -75,13 +75,9 @@
)
```

Usually, we would *copy/paste* the **summary statistics** we found in a paper. Or, try to get the **distribution parameters** from those reports. An additional source of issue is that the report of different statistical distributions is not consistent across the literature. `{epiparameter}`’s objective is to facilitate the access to parameters to implement them into your analysis pipeline. `{epiparameter}` provide information for a collection of distributions for a range of infectious diseases that is as accurate, unbiased and as comprehensive as possible.
It is a common practice for analysts to manually search the available literature and copy and paste the **summary statistics** or the **distribution parameters** from scientific publications. A challenge that is often faced is that the reporting of different statistical distributions is not consistent across the literature. `{epiparameter}`’s objective is to facilitate the access to reliable estimates of distribution parameters for a range of infectious diseases, so that they can easily be implemented in outbreak analytic pipelines.

<!-- https://epiverse-trace.github.io/epiparameter/articles/data_protocol.html -->

Today, we'll *choose* the summary statistics from the library of epidemiological parameters provided by `{epiparameter}`.

<!-- Instead of *manually* plug-in numeric values to `EpiNow2::dist_spec()` to specify the **summary statistics** of the delay distribution, we are going to *choose* them from the library of epidemiological parameters provided by `{epiparameter}`. -->
In this episode, we will *choose* the summary statistics from the library of epidemiological parameters provided by `{epiparameter}`.

<!--
```r
Expand All @@ -99,20 +95,20 @@
```
-->

## Find a Generation time
## Generation time vs serial interval

The generation time, jointly with the $R$, can inform about the speed of spread and its feasibility of control. Given a $R>1$, with a shorter generation time, cases can appear more quickly.
The generation time, jointly with the reproduction number ($R$), provide valuable insights on the strength of transmission and inform the implementation of control measures. Given a $R>1$, the shorter the generation time, the earlier the incidence of disease cases will grow.

![Video from the MRC Centre for Global Infectious Disease Analysis, Ep 76. Science In Context - Epi Parameter Review Group with Dr Anne Cori (27-07-2023) at <https://youtu.be/VvpYHhFDIjI?si=XiUyjmSV1gKNdrrL>](fig/reproduction-generation-time.png)

Check warning on line 102 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/reproduction-generation-time.png

In calculating the effective reproduction number ($R_{t}$), the *generation time* distribution is often approximated by the [serial interval](../learners/reference.md#serialinterval) distribution.
This frequent approximation is because it is easier to observe and measure the onset of symptoms than the onset of infectiousness.

![A schematic of the relationship of different time periods of transmission between an infector and an infectee in a transmission pair. Exposure window is defined as the time interval having viral exposure, and transmission window is defined as the time interval for onward transmission with respect to the infection time ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)).](fig/serial-interval-observed.jpeg)

Check warning on line 107 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-observed.jpeg

However, using the *serial interval* as an approximation of the *generation time* is primarily valid for diseases in which infectiousness starts after symptom onset ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)). In cases where infectiousness starts before symptom onset, the serial intervals can have negative values, which is the case of a pre-symptomatic transmission ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext#gr2)).
However, using the *serial interval* as an approximation of the *generation time* is primarily valid for diseases in which infectiousness starts after symptom onset ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)). In cases where infectiousness starts before symptom onset, the serial intervals can have negative values, which is the case for diseases with pre-symptomatic transmission ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext#gr2)).

Additionally, even if the *generation time* and *serial interval* have the same mean, their variance usually differs, propagating bias to the $R_{t}$ estimation. $R_{t}$ estimates are sensitive not only to the mean generation time but also to the variance and form of the generation interval distribution [(Gostic et al., 2020)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008409).
<!-- Additionally, even if the *generation time* and *serial interval* have the same mean, their variance usually differs, propagating bias to the $R_{t}$ estimation ([Gostic et al., 2020](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008409)). -->

::::::::::::::::: callout

Expand All @@ -120,13 +116,13 @@

When we calculate the *serial interval*, we see that not all case pairs have the same time length. We will observe this variability for any case pair and individual time period, including the [incubation period](../learners/reference.md#incubation) and [infectious period](../learners/reference.md#infectiousness).

![Serial intervals of possible case pairs in (a) COVID-19 and (b) MERS-CoV. Pairs represent a presumed infector and their presumed infectee plotted by date of symptom onset ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig6)).](fig/serial-interval-pairs.jpg)

Check warning on line 119 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-pairs.jpg

To summarise these data from individual and pair time periods, we can find the **statistical distributions** that best fit the data ([McFarland et al., 2023](https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.27.2200806)).

<!-- add a reference about good practices to estimate distributions -->

![Fitted serial interval distribution for (a) COVID-19 and (b) MERS-CoV based on reported transmission pairs in Saudi Arabia. We fitted three commonly used distributions, Lognormal, Gamma, and Weibull distributions, respectively ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig5)).](fig/seria-interval-fitted-distributions.jpg)

Check warning on line 125 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/seria-interval-fitted-distributions.jpg

Statistical distributions are summarised in terms of their **summary statistics** like the *location* (mean and percentiles) and *spread* (variance or standard deviation) of the distribution, or with their **distribution parameters** that inform about the *form* (shape and rate/scale) of the distribution. These estimated values can be reported with their **uncertainty** (95% confidence intervals).

Expand Down Expand Up @@ -160,7 +156,7 @@
- Which one would be harder to control?
- Why do you conclude that?

![Serial interval of novel coronavirus (COVID-19) infections overlaid with a published distribution of SARS. ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext))](fig/serial-interval-covid-sars.jpg)

Check warning on line 159 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-covid-sars.jpg

::::::::::::::::: hint

Expand All @@ -170,13 +166,13 @@

::::::::::::::::: solution

Which one would be harder to control?
**Which one would be harder to control?**

- COVID-19
COVID-19

Why do you conclude that?
**Why do you conclude that?**

- COVID-19 has the lowest mean serial interval. The approximate mean value for the serial interval of COVID-19 is around four days, and SARS is about seven days. Thus, COVID-19 will likely have newer generations in less time than SARS, assuming similar reproduction numbers.
COVID-19 has the lowest mean serial interval. The approximate mean value for the serial interval of COVID-19 is around four days, and SARS is about seven days. Thus, COVID-19 will likely have newer generations in less time than SARS, assuming similar reproduction numbers.

::::::::::::::::::::::::::

Expand All @@ -188,11 +184,11 @@

::::::::::::::::::::::

## Extract epidemiological parameters
## Choosing epidemiological parameters

First, let's assume that the data set `example_confirmed` has COVID-19 observed cases. So, we need to find a reported generation time for COVID-19 or any other useful parameter for this aim.
In this section, we will use `{epiparameter}` to obtain the generation time and the serial interval for COVID-19, so these metrics can be used to estimate the transmissibility of this disease using `{EpiNow2}` in subsequent sections of this episode.

Let's start by looking at how many parameters we have in the epidemiological distributions database in `{epiparameter}` (`epidist_db`) for the `disease` named `covid`-19:
Let's start by looking at how many entries are available in the epidemiological distributions database in `{epiparameter}` (`epidist_db`) for the `disease` named `covid`-19:

```{r}
epiparameter::epidist_db(
Expand All @@ -210,7 +206,7 @@
)
```

Currently, in the library of epidemiological parameters, we have one `generation` time entry for Influenza. Considering the abovementioned considerations, we can look at the `serial` intervals for `COVID`-19. Run this locally!
Currently, in the library of epidemiological parameters, we have one `generation` time entry for Influenza. Considering the above-mentioned considerations, we can look at the `serial` intervals for `COVID`-19. Run this locally!

```{r,eval=FALSE}
epiparameter::epidist_db(
Expand All @@ -225,7 +221,7 @@

### CASE-INSENSITIVE

`epidist_db` is [case-insensitive](https://dillionmegida.com/p/case-sensitivity-vs-case-insensitivity/#case-insensitivity). This means that you can use strings with letters in upper or lower case indistinctly.
`epidist_db` is [case-insensitive](https://dillionmegida.com/p/case-sensitivity-vs-case-insensitivity/#case-insensitivity). This means that you can use strings with letters in upper or lower case indistinctly. Strings like `"serial"`, `"serial interval"` or `"serial_interval"` are also valid.

:::::::::::::::::::::::::

Expand Down Expand Up @@ -404,8 +400,6 @@

:::::::::::::::::::::::::

Now, we have an epidemiological parameter we can reuse! We can replace the **summary statistics** numbers we plug into `EpiNow2::dist_spec()`.

Let's assign this `<epidist>` class object to the `covid_serialint` object.

```{r,message=FALSE}
Expand Down Expand Up @@ -469,13 +463,13 @@
covid_serialint$summary_stats$mean
```

Notice that with this output we can replace one of the inputs for the `EpiNow2::dist_spec()` function:
Now, we have an epidemiological parameter we can reuse! We can replace the **summary statistics** numbers we plug into the `EpiNow2::dist_spec()` function:

```r
generation_time <-
EpiNow2::dist_spec(
mean = covid_serialint$summary_stats$mean, # we changed this line :)
sd = 2,
mean = covid_serialint$summary_stats$mean, # replaced!
sd = covid_serialint$summary_stats$sd, # replaced!
max = 20,
distribution = "gamma"
)
Expand Down
Binary file added episodes/fig/pkgs-hexlogos-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 20 additions & 7 deletions episodes/quantify-transmissibility.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,9 @@ But in an ongoing outbreak, the population does not remain entirely susceptible

## Introduction

The transmission intensity of an outbreak is quantified using two key metrics: the reproduction number, which informs on the strength of the transmission by indicating how many new cases are expected from each existing case; and the [growth rate](../learners/reference.md#growth), which informs on the speed of the transmission by indicating how rapidly the outbreak is spreading or declining (doubling/halving time) within a population. To estimate these key metrics using case data we must account for delays between the date of infections and date of reported cases. In an outbreak situation, data are usually available on reported dates only, therefore we must use estimation methods to account for these delays when trying to understand changes in transmission over time. For more details on the distinction between speed and strength of transmission and implications for control, see [Dushoff & Park, 2021](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2020.1556).
The transmission intensity of an outbreak is quantified using two key metrics: the reproduction number, which informs on the strength of the transmission by indicating how many new cases are expected from each existing case; and the [growth rate](../learners/reference.md#growth), which informs on the speed of the transmission by indicating how rapidly the outbreak is spreading or declining (doubling/halving time) within a population. For more details on the distinction between speed and strength of transmission and implications for control, review [Dushoff & Park, 2021](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2020.1556).

To estimate these key metrics using case data we must account for delays between the date of infections and date of reported cases. In an outbreak situation, data are usually available on reported dates only, therefore we must use estimation methods to account for these delays when trying to understand changes in transmission over time.

In the next tutorials we will focus on how to use the functions in `{EpiNow2}` to estimate transmission metrics of case data. We will not cover the theoretical background of the models or inference framework, for details on these concepts see the [vignette](https://epiforecasts.io/EpiNow2/dev/articles/estimate_infections.html).

Expand Down Expand Up @@ -119,7 +121,11 @@ cases <- incidence2::covidregionaldataUK %>%

### When to use incidence2?

We can also use the `{incidence2}` package to aggregate cases. However, if you ever need to aggregate you data in a different time **interval** (i.e., days, weeks or months) or per **group** categories, we recommend you to explore the `incidence2::incidence()` function:
We can also use the `{incidence2}` package to:

- Aggregate cases (similar to the code above) but in different time *intervals* (i.e., days, weeks or months) or per *group* categories. Explore later the [`incidence2::incidence()` reference manual](https://www.reconverse.org/incidence2/reference/incidence.html).

- Complete dates for all the range of dates per group category using `incidence2::complete_dates()`. Read further in its [function reference manual](https://www.reconverse.org/incidence2/reference/complete_dates.html).

```{r, warning = FALSE, message = FALSE, eval=FALSE}
library(tidyr)
Expand All @@ -132,13 +138,13 @@ incidence2::covidregionaldataUK %>%
incidence2::incidence(
date_index = "date",
counts = "cases_new",
groups = "region",
interval = "week"
)
count_values_to = "confirm",
date_names_to = "date"
) %>%
# complete range of dates
incidence2::complete_dates()
```

You can also estimate transmission metrics from {incidence2} objects using the `{i2extras}` package. Read further in the [Fitting curves](https://www.reconverse.org/i2extras/articles/fitting_epicurves.html) vignette!

:::::::::::::::::::::::::

There are case data available for `r dim(cases)[1]` days, but in an outbreak situation it is likely we would only have access to the beginning of this data set. Therefore we assume we only have the first 90 days of this data.
Expand Down Expand Up @@ -475,6 +481,13 @@ estimates_regional$summary$summarised_results$table
estimates_regional$summary$plots$R
```

:::::::::::::::::::::::::: testimonial

### the i2extras package

`{i2extras}` package also estimate transmission metrics like growth rate and doubling/halving time at different time intervals (i.e., days, weeks, or months). `{i2extras}` require `{incidence2}` objects as inputs. Read further in its [Fitting curves](https://www.reconverse.org/i2extras/articles/fitting_epicurves.html) vignette.

::::::::::::::::::::::::::

## Summary

Expand Down
4 changes: 2 additions & 2 deletions learners/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Setup

## Motivation

**Outbreaks** appear with different diseases and in different contexts, but what all of them have in common is the key public health **questions** ([Cori et al. 2017](https://royalsocietypublishing.org/doi/10.1098/rstb.2016.0371#d1e605)). We can relate these key public health questions to outbreak data analysis **tasks**.
**Outbreaks** appear with different diseases and in different contexts, but what all of them have in common is the key public health questions ([Cori et al. 2017](https://royalsocietypublishing.org/doi/10.1098/rstb.2016.0371#d1e605)). We can relate these key public health questions to outbreak data analysis tasks.

Epiverse-TRACE aims to provide a software ecosystem for [**outbreak analytics**](reference.md#outbreakanalytics) with integrated, generalisable and scalable community-driven software. We support the development of R packages, make the existing ones interoperable for the user experience, and stimulate a community of practice.

Expand Down Expand Up @@ -35,7 +35,7 @@ Also check out the [glossary](../reference.md) for any terms you may be unfamili

Our strategy is to gradually incorporate specialised **R packages** into our traditional analysis pipeline. These packages should fill the gaps in these epidemiology-specific tasks in response to outbreaks.

![In **R**, the fundamental unit of shareable code is the **package**. A package bundles together code, data, documentation, and tests and is easy to share with others ([Wickham and Bryan, 2023](https://r-pkgs.org/introduction.html))](episodes/fig/pkgs-hexlogos.png)
![In **R**, the fundamental unit of shareable code is the **package**. A package bundles together code, data, documentation, and tests and is easy to share with others ([Wickham and Bryan, 2023](https://r-pkgs.org/introduction.html))](episodes/fig/pkgs-hexlogos-2.png)

:::::::::::::::::::::::::::: prereq

Expand Down
Loading