-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d02aaa0
commit d996d68
Showing
12 changed files
with
530 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
title: 'Clean outbreaks data' | ||
teaching: 10 | ||
exercises: 2 | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- How to clean and standardize case data? | ||
- How to convert raw dataset into a `linelist` object? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Explain how clean, curate, and standardize case data using `{cleanepi}` package | ||
- Demonstrate how to covert case data to a `linelist` object | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
## Introduction | ||
In the process of analyzing outbreak data, it's essential to ensure that the dataset is clean, curated, standardized, and validate to facilitate accurate and reproducible analysis. This episode focuses on cleaning epidemics and outbreaks data using the [cleanepi](https://epiverse-trace.github.io/cleanepi/) package, and validate it using the [linelist](https://epiverse-trace.github.io/linelist/) package. For demonstration purposes, we'll work with a simulated dataset of Ebola cases. | ||
|
||
|
||
The first step is to import the dataset following the guidelines outlined in the [Read case data](../episodes/read-cases.Rmd) episode. This involves loading the dataset into our environment and view its structure and content. | ||
|
||
|
||
```r | ||
requireNamespace("rio", quietly = TRUE) | ||
sim_ebola_data <- rio::import(file.path("data", "simulated_ebola.csv", | ||
fsep = "/")) | ||
utils::head(sim_ebola_data, 5) | ||
``` | ||
|
||
```{.output} | ||
case id age gender status date onset date sample district cheifdom | ||
1 14905 90 1 confirmed 03/15/2015 06/04/2015 M D | ||
2 13043 twenty-five 2 Sep /11/Y 03/01/2014 W X | ||
3 14364 54 f <NA> 09/02/2014 03/03/2015 N M | ||
4 14675 ninety <NA> 10/19/2014 31/ 12 /14 D W | ||
5 12648 74 F 08/06/2014 10/10/2016 V Y | ||
age_cat | ||
1 65+ | ||
2 twenty-five | ||
3 25-64 | ||
4 ninety | ||
5 65+ | ||
``` | ||
|
||
## Quick inspection | ||
Quick exploration and inspection of the dataset are crucial before diving into any analysis tasks. The `{cleanepi}` package simplifies this process with the `scan_data()` function. Let's take a look at how you can use it: | ||
|
||
|
||
```r | ||
requireNamespace("cleanepi", quietly = TRUE) | ||
cleanepi::scan_data(sim_ebola_data) | ||
``` | ||
|
||
```{.output} | ||
Field_names missing numeric date character logical | ||
1 case id 0.000000 1.0000 0.000000 0.000000 0 | ||
2 age 0.064600 0.8348 0.000000 0.100600 0 | ||
3 gender 0.157867 0.0472 0.000000 0.794933 0 | ||
4 status 0.053533 0.0000 0.000000 0.946467 0 | ||
5 date onset 0.000067 0.0000 0.915733 0.084200 0 | ||
6 date sample 0.000133 0.0000 0.999867 0.000000 0 | ||
7 district 0.000000 0.0000 0.000000 1.000000 0 | ||
8 cheifdom 0.000000 0.0000 0.000000 1.000000 0 | ||
9 age_cat 0.064600 0.0000 0.000000 0.935400 0 | ||
``` | ||
|
||
|
||
The results provides a summary of each column, including column names, data types, number of missing values, and summary statistics for numerical columns. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,10 +59,10 @@ contact: '[email protected]' | |
|
||
# Order of episodes in your lesson | ||
episodes: | ||
#- read-cases.Rmd | ||
#- clean-data.Rmd | ||
#- describe-cases.Rmd | ||
#- simple-analysis.Rmd | ||
- read-cases.Rmd | ||
- clean-data.Rmd | ||
- describe-cases.Rmd | ||
- simple-analysis.Rmd | ||
- delays-reuse.Rmd | ||
- delays-functions.Rmd | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
--- | ||
title: 'Aggregate and visulaize' | ||
teaching: 10 | ||
exercises: 2 | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- How to aggregate case data? | ||
- How to visualize aggregated data? | ||
- What is distribution of cases in time, place, gender, age? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Convert case data to incidence | ||
- Create epidemic curves from incidence data | ||
- Estimate the growth rate from incidence curves | ||
- Create quick descriptive and comparison tables | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
## Introduction | ||
|
||
A comprehensive description of data is pivotal for conducting insightful explanatory and exploratory analyses. This episode focuses on describing and visualizing epidemic data. The examples are built around the **Covid-19 case data from England** dataset that contained in the [outbreaks](http://www.reconverse.org/outbreaks/) package. The first setp is to read this dataset, and we recommend utilizing the [readr](../links.md#readr) package for this purpose (or employing alternative methods as outlined in the [Read case data](../episodes/read-cases.Rmd) episode. | ||
|
||
|
||
```r | ||
requireNamespace("outbreaks", quietly = TRUE) | ||
covid19_eng_case_data <- outbreaks::covid19_england_nhscalls_2020 | ||
utils::head(covid19_eng_case_data, 5) | ||
``` | ||
|
||
```{.output} | ||
site_type date sex age ccg_code | ||
1 111 2020-03-18 female missing e38000062 | ||
2 111 2020-03-18 female missing e38000163 | ||
3 111 2020-03-18 female 0-18 e38000001 | ||
4 111 2020-03-18 female 0-18 e38000002 | ||
5 111 2020-03-18 female 0-18 e38000004 | ||
ccg_name count postcode | ||
1 nhs_gloucestershire_ccg 1 gl34fe | ||
2 nhs_south_tyneside_ccg 1 ne325nn | ||
3 nhs_airedale_wharfedale_and_craven_ccg 8 bd57jr | ||
4 nhs_ashford_ccg 7 tn254ab | ||
5 nhs_barking_and_dagenham_ccg 35 rm13ae | ||
nhs_region day weekday | ||
1 South West 0 rest_of_week | ||
2 North East and Yorkshire 0 rest_of_week | ||
3 North East and Yorkshire 0 rest_of_week | ||
4 South East 0 rest_of_week | ||
5 London 0 rest_of_week | ||
``` | ||
|
||
## Incidence data | ||
|
||
Downstream analysis involves working with aggregated data rather than individual cases. This requires aggregating case data and creating incidence data. The [incidence2]((https://www.reconverse.org/incidence2/articles/incidence2.html){.external target="_blank"}) package offers an essential function, called `incidence`, for grouping case data, usually centered around dated events and/or other factors. The code chunk provided below demonstrates the creation of an `incidence2` object from the `covid19_eng_case_data` based on the date of samples. | ||
|
||
|
||
```r | ||
requireNamespace("incidence2", quietly = TRUE) | ||
covid19_eng_incidence_data <- incidence2::incidence( | ||
covid19_eng_case_data, | ||
date_index = "date" | ||
) | ||
utils::head(covid19_eng_incidence_data, 5) | ||
``` | ||
|
||
```{.output} | ||
# incidence: 5 x 3 | ||
# count vars: date | ||
date_index count_variable count | ||
* <date> <chr> <int> | ||
1 2020-03-18 date 2579 | ||
2 2020-03-19 date 2602 | ||
3 2020-03-20 date 2615 | ||
4 2020-03-21 date 2588 | ||
5 2020-03-22 date 2603 | ||
``` | ||
|
||
The `incidence2` object can be visualized using the `plot()` function from base R package. | ||
|
||
|
||
```r | ||
base::plot(covid19_eng_incidence_data) | ||
``` | ||
|
||
<img src="fig/describe-cases-rendered-unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> | ||
|
||
Moreover, `{incidence2}` has other functions that allow for aggregating case data based on a dated event and other factors such as the individual gender, the sampling location, etc. In the example below, we calculate weekly counts of Covid-19 cases in England grouping them by `sex` type. | ||
|
||
|
||
```r | ||
weekly_covid19_eng_incidence <- incidence2::incidence( | ||
covid19_eng_case_data, | ||
date_index = "date", | ||
interval = "week", | ||
groups = "sex" | ||
) | ||
base::plot(weekly_covid19_eng_incidence, angle = 45) | ||
``` | ||
|
||
<img src="fig/describe-cases-rendered-unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> | ||
|
||
|
||
|
||
::::::::::::::::::::::::::::::::::::: challenge | ||
|
||
## Challenge 1: Can you do it? | ||
|
||
- Using the above `covid91_eng_case_data` dataset, produce monthly epi-curves for Covid-19 cases in England based on regional places in England? | ||
|
||
:::::::::::::::::::::::: solution | ||
|
||
|
||
```r | ||
monthly_covid19_eng_incidence <- incidence2::incidence( | ||
covid19_eng_case_data, | ||
date_index = "date", | ||
interval = "month", | ||
groups = "sex" | ||
) | ||
base::plot(monthly_covid19_eng_incidence, angle = 45) | ||
``` | ||
|
||
::::::::::::::::::::::::::::::::: | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- Use `{incidence2}` to aggregate case data based on a date event. | ||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
--- | ||
title: 'Read case data' | ||
teaching: 20 | ||
exercises: 2 | ||
editor_options: | ||
chunk_output_type: inline | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
- Where do you usually store your outbreak data? | ||
- How many different data formats can I read? | ||
- Is it possible to import data from database and health APIs? | ||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Explain how to import outbreak data from different sources into `R` | ||
environment for analysis. | ||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: prereq | ||
|
||
## Prerequisites | ||
|
||
This episode requires you to be familiar with: | ||
|
||
**Data science** : Basic programming with R. | ||
::::::::::::::::::::::::::::::::: | ||
|
||
## Introduction | ||
|
||
The initial step in outbreak analysis involves importing the target dataset into the `R` environment from various sources. Outbreak data is typically stored in files of diverse formats, relational database management systems (RDBMS), or health information system (HIS) application program interfaces (APIs) such as [REDCap](https://www.project-redcap.org/) and [DHIS2](https://dhis2.org/). The latter option is particularly well-suited for storing institutional health data. This episode will elucidate the process of reading cases from these sources. | ||
|
||
## Reading from files | ||
|
||
Several packages are available for importing outbreak data stored in individual files into `R`. These include [rio](http://gesistsa.github.io/rio/), [readr](https://readr.tidyverse.org/) from the `tidyverse`, [io](https://bitbucket.org/djhshih/io/src/master/), and [ImportExport](https://cran.r-project.org/web/packages/ImportExport/index.html). Together, these packages offer methods to read single or multiple files in a wide range of formats. | ||
|
||
The below example shows how to import a `csv` file into `R` environment using `rio` package. | ||
|
||
```r | ||
requireNamespace("rio", quietly = TRUE) | ||
case_data <- rio::import(file.path("data", "ebola_cases.csv", fsep = "/")) | ||
head(case_data, 5) | ||
``` | ||
|
||
```{.output} | ||
date confirm | ||
1 2014-05-18 1 | ||
2 2014-05-20 2 | ||
3 2014-05-21 4 | ||
4 2014-05-22 6 | ||
5 2014-05-23 1 | ||
``` | ||
|
||
Similarly, you can import files of other formats such as `tsv`, `xlsx`, etc. | ||
|
||
::::::::::::::::::::::::::::::::: challenge | ||
|
||
### Reading compressed data | ||
Take 1 minute: | ||
- Is it possible to read compressed data in `R`? | ||
|
||
::::::::::::::::: hint | ||
|
||
You can check the supported file formats in the `{rio}` package as follows: | ||
|
||
```r | ||
requireNamespace("rio", quietly = TRUE) | ||
rio::install_formats() | ||
``` | ||
|
||
:::::::::::::::::::::: | ||
|
||
::::::::::::::::: solution | ||
|
||
|
||
```r | ||
requireNamespace("rio", quietly = TRUE) | ||
rio::import(file.path("path_name", "file_name.zip", fsep = "/")) | ||
``` | ||
|
||
:::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
|
||
## Reading from databases | ||
|
||
The [DBI](https://dbi.r-dbi.org/) package serves as a versatile interface for interacting with database management systems (DBMS) across different back-ends or servers. It offers a uniform method for accessing and retrieving data from various database systems. | ||
|
||
|
||
The following code chunk demonstrates how to create a temporary SQLite database in memory, store the `case_data` dataframe as a table within it, and subsequently read from it: | ||
|
||
|
||
```r | ||
requireNamespace("DBI", quietly = TRUE) | ||
requireNamespace("RSQLite", quietly = TRUE) | ||
# Create a temporary SQLite database in memory | ||
db_con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") | ||
# Store the 'case_data' dataframe as a table named 'cases' | ||
# in the SQLite database | ||
DBI::dbWriteTable(db_con, "cases", case_data) | ||
# Read data from the 'cases' table | ||
result <- DBI::dbReadTable(db_con, "cases") | ||
# Close the database connection | ||
DBI::dbDisconnect(db_con) | ||
# View the result | ||
base::print(utils::head(result)) | ||
``` | ||
|
||
```{.output} | ||
date confirm | ||
1 16208 1 | ||
2 16210 2 | ||
3 16211 4 | ||
4 16212 6 | ||
5 16213 1 | ||
6 16214 2 | ||
``` | ||
|
||
This code first establishes a connection to an SQLite database created in memory using `dbConnect` function. Then, it writes the `case_data` dataframe into a table named 'cases' within the database using `dbWriteTable` function. Subsequently, it reads the data from the 'cases' table using `dbReadTable` function. Finally, it closes the database connection with `dbDisconnect` function. More examples about SQL databases and R can be found [here](https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html). | ||
|
||
## Reading from HIS APIs | ||
|
||
Health related data are also increasingly stored in specialized HIS APIs like **Fingertips**, **GoData**, **REDCap**, and **DHIS2**. In such case one can resort to [readepi](https://epiverse-trace.github.io/readepi/) package, which enables reading data from HIS-APIs. | ||
-[TBC] | ||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
- Use `{rio}, {io}, {readr}` and `{ImportExport}` to read data from individual files. | ||
- Use `{DBI}` to read data from databases. | ||
- Use `{readepi}` to read data form HIS APIs. | ||
:::::::::::::::::::::::::::::::::::::::::::::::: |
Oops, something went wrong.