-
Notifications
You must be signed in to change notification settings - Fork 7
/
README.Rmd
165 lines (126 loc) · 5.97 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
warning = FALSE, message = FALSE,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# IncidencePrevalence <a href="https://darwin-eu.github.io/IncidencePrevalence/"><img src="man/figures/logo.png" align="right" height="138" alt="IncidencePrevalence website" /></a>
[![CRANstatus](https://www.r-pkg.org/badges/version/IncidencePrevalence)](https://CRAN.R-project.org/package=IncidencePrevalence)
[![codecov.io](https://codecov.io/github/darwin-eu/IncidencePrevalence/coverage.svg?branch=main)](https://app.codecov.io/github/darwin-eu/IncidencePrevalence?branch=main)
[![R-CMD-check](https://github.com/darwin-eu/IncidencePrevalence/workflows/R-CMD-check/badge.svg)](https://github.com/darwin-eu/IncidencePrevalence/actions)
[![Lifecycle:Experimental](https://img.shields.io/badge/Lifecycle-Experimental-339999)](https://lifecycle.r-lib.org/articles/stages.html)
## Package overview
IncidencePrevalence contains functions for estimating population-level incidence and prevalence using the OMOP common data model. For more information on the package please see our paper in Pharmacoepidemiology and Drug Safety.
> Raventós, B, Català, M, Du, M, et al. IncidencePrevalence: An R package to calculate population-level incidence rates and prevalence using the OMOP common data model. Pharmacoepidemiol Drug Saf. 2023; 1-11. doi: 10.1002/pds.5717
If you find the package useful in supporting your research study, please consider citing this paper.
## Package installation
You can install the latest version of IncidencePrevalence from CRAN:
```{r, eval=FALSE}
install.packages("IncidencePrevalence")
```
Or from github:
```{r, eval=FALSE}
install.packages("remotes")
remotes::install_github("darwin-eu/IncidencePrevalence")
```
## Example usage
### Create a reference to data in the OMOP CDM format
The IncidencePrevalence package is designed to work with data in the OMOP CDM format, so our first step is to create a reference to the data using the CDMConnector package.
```{r}
library(CDMConnector)
library(IncidencePrevalence)
```
Creating a connection to a Postgres database would for example look like:
```{r, eval=FALSE}
con <- DBI::dbConnect(RPostgres::Postgres(),
dbname = Sys.getenv("CDM5_POSTGRESQL_DBNAME"),
host = Sys.getenv("CDM5_POSTGRESQL_HOST"),
user = Sys.getenv("CDM5_POSTGRESQL_USER"),
password = Sys.getenv("CDM5_POSTGRESQL_PASSWORD")
)
cdm <- CDMConnector::cdm_from_con(con,
cdm_schema = Sys.getenv("CDM5_POSTGRESQL_CDM_SCHEMA"),
write_schema = Sys.getenv("CDM5_POSTGRESQL_RESULT_SCHEMA")
)
```
To see how you would create a reference to your database please consult the CDMConnector package documentation. For this example though we´ll work with simulated data, and we'll generate an example cdm reference like so:
```{r}
cdm <- mockIncidencePrevalenceRef(sampleSize = 10000,
outPre = 0.3,
minOutcomeDays = 365,
maxOutcomeDays = 3650)
```
### Identify a denominator cohort
To identify a set of denominator cohorts we can use the `generateDenominatorCohortSet` function. Here we want to identify denominator populations for a study period between 2008 and 2018 and with 180 days of prior history (observation time in the database). We also wish to consider multiple age groups (from 0 to 64, and 65 to 100) and multiple sex criteria (one cohort only males, one only females, and one with both sexes included).
```{r}
cdm <- generateDenominatorCohortSet(
cdm = cdm,
name = "denominator",
cohortDateRange = c(as.Date("2008-01-01"), as.Date("2018-01-01")),
ageGroup = list(
c(0, 64),
c(65, 100)
),
sex = c("Male", "Female", "Both"),
daysPriorObservation = 180
)
```
This will then give us six denominator cohorts
```{r}
cohortSet(cdm$denominator)
```
These cohorts will be in the typical OMOP CDM structure
```{r}
cdm$denominator
```
### Estimating incidence and prevalence
As well as a denominator cohort, an outcome cohort will need to be identified. Defining outcome cohorts is done outside of the IncdidencePrevalence package and our mock data already includes an outcome cohort.
```{r}
cdm$outcome
```
Now we have identified our denominator population, we can calculate incidence and prevalence as below. Note, in our example cdm reference we already have an outcome cohort defined.
For this example we´ll estimate incidence on a yearly basis, allowing individuals to have multiple events but with an outcome washout of 180 days. We also require that only complete database intervals are included, by which we mean that the database must have individuals observed throughout a year for that year to be included in the analysis. Note, we also specify a minimum cell count of 5, under which estimates will be obscured.
```{r}
inc <- estimateIncidence(
cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "years",
repeatedEvents = TRUE,
outcomeWashout = 180,
completeDatabaseIntervals = TRUE,
minCellCount = 5
)
plotIncidence(inc, facet = c("denominator_age_group", "denominator_sex"))
```
We could also estimate point prevalence, as of the start of each calendar year like so:
```{r}
prev_point <- estimatePointPrevalence(
cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "years",
timePoint = "start",
minCellCount = 5
)
plotPrevalence(prev_point, facet = c("denominator_age_group", "denominator_sex"))
```
And annual period prevalence where we again require complete database intervals and, in addition, only include those people who are observed in the data for the full year:
```{r}
prev_period <- estimatePeriodPrevalence(
cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "years",
completeDatabaseIntervals = TRUE,
fullContribution = TRUE,
minCellCount = 5
)
plotPrevalence(prev_period, facet = c("denominator_age_group", "denominator_sex"))
```