Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #73 #86

Merged
merged 7 commits into from
Sep 26, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion inst/WORDLIST.txt
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,38 @@ siloed
stefanthoma
useR
admiralroche
ADTM
ADY
ae
AEENDTC
AEN
AESTDTC
args
AST
ASTDTM
datetime
Datetime
DCUTDT
dt
dtc
DTC
dtf
DTHDT
dtm
hms
lubridate
mh
MHSTDTC
mmThh
ss
tmf
TRTSDT
TRTSDTM
VSDTC
VSTPT
ymd
yyyy
=======
AENDT
AENDY
ASTDTM
Expand All @@ -283,4 +315,4 @@ DTM
dy
lubridate
TRTSDTM
ymd
ymd
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
---
title: "Date/Time Functions and Imputation in {admiral} "
author:
- name: Edoardo Mancini
description: "Dates, times and imputation can be a frustrating facet of any programming language. Here's how {admiral} makes all of this easy!"
date: "2023-08-21"
# please do not use any non-default categories.
# You can find the default categories in the repository README.md
categories: [admiral]
manciniedoardo marked this conversation as resolved.
Show resolved Hide resolved
# feel free to change the image
image: "admiral.png"

---

<!--------------- typical setup ----------------->

```{r setup, include=FALSE}
long_slug <- "2023-08-21_date_functions_and_imputation"
# renv::use(lockfile = "renv.lock")

library(admiraldev)
```

<!--------------- post begins here ----------------->

# Introduction

Date and time is collected in SDTM as character values using the extended [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format `"yyyy-dd-mmThh:mm:ss"`. This universal format allows missing parts date or time - e.g. the string`"2019-10"` represents a date where the day and the time are unknown. In contrast, ADaM timing variables like `ADTM` (Analysis Datetime) or `ADY` (Analysis Relative Day) are numeric variables, which can be derived only if the date or datetime is complete.
manciniedoardo marked this conversation as resolved.
Show resolved Hide resolved

Most ADaM programmers have, at one point or another, encountered situations where missing dates, unexpected formats or confusing imputation functions rendered derivations of timing variables frustrating and time consuming. `{admiral}` aims to mitigate this (where possible!) by providing functions which automatically derive date/datetime variables for you, and fill in missing date or time parts according to well-defined imputation rules.

In this article, we first examine the arsenal of functions provided by`{admiral}` to aid in datetime imputation and timing variable derivation. We then observe everything in action through a number of selected typical examples.

# Date/Datetime Derivation and Imputation Functions

`{admiral}` provides the following functions for date/datetime imputation:

- `derive_vars_dt()`: Adds a date variable and a date imputation flag variable (optional) based on a --DTC variable and imputation rules.
- `derive_vars_dtm()`: Adds a datetime variable, a date imputation flag variable, and a time imputation flag variable (both optional) based on a --DTC variable and imputation rules.
- `impute_dtc_dtm()`: Returns a complete ISO 8601 datetime or `NA` based on a partial ISO 8601 datetime and imputation rules.
- `impute_dtc_dt()`: Returns a complete ISO 8601 date (without time) or `NA` based on a partial ISO 8601 date(time) and imputation rules.
- `convert_dtc_to_dt()`: Returns a date if the input ISO 8601 date is complete. Otherwise, `NA` is returned.
- `convert_dtc_to_dtm()`: Returns a datetime if the input ISO 8601 date is complete (with missing time replaced by `"00:00:00"` as default). Otherwise, NA is returned.
- `compute_dtf()`: Returns the date imputation flag.
- `compute_tmf()`: Returns the time imputation flag.
manciniedoardo marked this conversation as resolved.
Show resolved Hide resolved
manciniedoardo marked this conversation as resolved.
Show resolved Hide resolved

From the point of view of a typical ADaM programmer, the functions `impute_*`, `convert_*` and `compute_*` above can be viewed as utilities for treating dates and/or imputation within any custom code. In contrast, their `derive_*` find their use in directly deriving new timing variables and/or carrying out imputation at an ADaM dataset scale.

For a detailed look at the Imputation rules applied by these `{admiral}` functions, please visit [this vignette](https://pharmaverse.github.io/admiral/cran-release/articles/imputation.html#imputation-rules) on the documentation website.

# Action Examples

## Creating an Imputed Datetime and Date Variable and Imputation Flag Variables

manciniedoardo marked this conversation as resolved.
Show resolved Hide resolved
As described previously, `derive_vars_dtm()` derives an imputed datetime variable and the corresponding date and time imputation flags. The imputed date variable can then be derived by using `derive_vars_dtm_to_dt()`. It is not necessary and advisable to perform the imputation for the date variable if it was already done for the datetime variable. CDISC considers the datetime and the date variable as two representations of the same date. Thus the imputation must be the same and the imputation flags are valid for both the datetime and the date variable.

```{r}
manciniedoardo marked this conversation as resolved.
Show resolved Hide resolved
library(admiral)
library(lubridate)
library(tibble)
library(dplyr, warn.conflicts = FALSE)

ae <- tribble(
~AESTDTC,
"2019-08-09T12:34:56",
"2019-04-12",
"2010-09",
NA_character_
) %>%
derive_vars_dtm(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "M",
date_imputation = "first",
time_imputation = "first"
manciniedoardo marked this conversation as resolved.
Show resolved Hide resolved
) %>%
derive_vars_dtm_to_dt(exprs(ASTDTM))
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Creating an Imputed Date Variable and Imputation Flag Variable

If an imputed date variable without a corresponding datetime variable is required, it can be derived by the `derive_vars_dt()` function.

```{r}
ae <- tribble(
~AESTDTC,
"2019-08-09T12:34:56",
"2019-04-12",
"2010-09",
NA_character_
) %>%
derive_vars_dt(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "M",
date_imputation = "first"
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Imputing Time without Imputing Date

If the time should be imputed but not the date, the `highest_imputation` argument should be set to `"h"`. This results in `NA` if the date is partial. As no date is imputed the date imputation flag is not created.

```{r}
ae <- tribble(
~AESTDTC,
"2019-08-09T12:34:56",
"2019-04-12",
"2010-09",
NA_character_
) %>%
derive_vars_dtm(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "h",
time_imputation = "first"
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Avoiding Imputed Dates Before a Particular Date
Usually an adverse event start date is imputed as the earliest date of all possible dates when filling the missing parts. The result may be a date before treatment start date. This is not desirable because the adverse event would not be considered as treatment emergent and excluded from the adverse event summaries. This can be avoided by specifying the treatment start date variable (`TRTSDTM`) for the `min_dates` argument.

Importantly, `TRTSDTM` is used as imputed date only if the non missing date and time parts of `AESTDTC` coincide with those of `TRTSDTM`. Therefore `2019-10` is not imputed as `2019-11-11 12:34:56`. This ensures that collected information is not changed by the imputation.

```{r}
ae <- tribble(
~AESTDTC, ~TRTSDTM,
"2019-08-09T12:34:56", ymd_hms("2019-11-11T12:34:56"),
"2019-10", ymd_hms("2019-11-11T12:34:56"),
"2019-11", ymd_hms("2019-11-11T12:34:56"),
"2019-12-04", ymd_hms("2019-11-11T12:34:56")
) %>%
derive_vars_dtm(
dtc = AESTDTC,
new_vars_prefix = "AST",
highest_imputation = "M",
date_imputation = "first",
time_imputation = "first",
min_dates = exprs(TRTSDTM)
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Avoiding Imputed Dates After a Particular Date

If a date is imputed as the latest date of all possible dates when filling the missing parts, it should not result in dates after data cut off or death. This can be achieved by specifying the dates for the `max_dates` argument.

Importantly, non missing date parts are not changed. Thus `2019-12-04` is imputed as `2019-12-04 23:59:59` although it is after the data cut off date. It may make sense to replace it by the data cut off date but this is not part of the imputation. It should be done in a separate data cleaning or data cut off step.
```{r}
ae <- tribble(
~AEENDTC, ~DTHDT, ~DCUTDT,
"2019-08-09T12:34:56", ymd("2019-11-11"), ymd("2019-12-02"),
"2019-11", ymd("2019-11-11"), ymd("2019-12-02"),
"2019-12", NA, ymd("2019-12-02"),
"2019-12-04", NA, ymd("2019-12-02")
) %>%
derive_vars_dtm(
dtc = AEENDTC,
new_vars_prefix = "AEN",
highest_imputation = "M",
date_imputation = "last",
time_imputation = "last",
max_dates = exprs(DTHDT, DCUTDT)
)
```
```{r, echo=FALSE}
dataset_vignette(ae)
```

## Imputation Without Creating a New Variable

If imputation is required without creating a new variable the `convert_dtc_to_dt()` function can be called to obtain a vector of imputed dates. It can be used for example here:

```{r}
mh <- tribble(
~MHSTDTC, ~TRTSDT,
"2019-04", ymd("2019-04-15"),
"2019-04-01", ymd("2019-04-15"),
"2019-05", ymd("2019-04-15"),
"2019-06-21", ymd("2019-04-15")
) %>%
filter(
convert_dtc_to_dt(
MHSTDTC,
highest_imputation = "M",
date_imputation = "first"
) < TRTSDT
)
```
```{r, echo=FALSE}
dataset_vignette(mh)
```

## Using More Than One Imputation Rule for a Variable

Using different imputation rules depending on the observation can be done by using the higher-order function `slice_derivation()`, which applies a derivation function differently (by varying its arguments) in different subsections of a dataset. For example, consider this Vital Signs case where pre-dose records require a different treatment to other records:

```{r}
vs <- tribble(
~VSDTC, ~VSTPT,
"2019-08-09T12:34:56", NA,
"2019-10-12", "PRE-DOSE",
"2019-11-10", NA,
"2019-12-04", NA
) %>%
slice_derivation(
derivation = derive_vars_dtm,
args = params(
dtc = VSDTC,
new_vars_prefix = "A"
),
derivation_slice(
filter = VSTPT == "PRE-DOSE",
args = params(time_imputation = "first")
),
derivation_slice(
filter = TRUE,
args = params(time_imputation = "last")
)
)
```
```{r, echo=FALSE}
dataset_vignette(vs)
```

# Conclusion

Deriving timing variables and carrying out imputations is tricky at the best of times, but hopefully this blog post can shed some light on how make this all easier using the `{admiral}` package! As `{admiral}` developers we are always interested in knowing how users are employing the package for their ADaM needs, so if you have any comments or feedback related to this topic, don't be afraid to leave a comment on our [Slack channel](https://app.slack.com/client/T028PB489D3/C02M8KN8269) or on the [Github repository](https://github.com/pharmaverse/admiral/), either as an issue or as a discussion.

For an even more detailed treatment of this topic, users are once again invited to read the corresponding [vignette](https://pharmaverse.github.io/admiral/cran-release/articles/imputation.html) on the documentation website, from which this article was adapted.

<!--------------- appendices go here ----------------->

```{r, echo=FALSE}
#| eval: false
source("appendix.R")
insert_appendix(
repo_spec = "pharmaverse/blog",
name = long_slug
)
```