Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #1792 multiple summary vars #2124

Merged
merged 12 commits into from
Oct 9, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: admiral
Type: Package
Title: ADaM in R Asset Library
Version: 0.12.0
Version: 0.12.0.9000
Authors@R: c(
person("Ben", "Straub", email = "[email protected]", role = c("aut", "cre")),
person("Stefan", "Bundfuss", role = "aut"),
Expand Down
18 changes: 18 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
# admiral 0.12.0.9000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have admiral (development version) instead of the number. I think usethis::use_version("dev") will update for you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, although use_version("dev") did not work well (it increased to 0.12.0.9001 and added an additional heading in NEWS.md and the commit failed). I think we should call use_version("dev") once directly after each release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe every time we merge into main while developing this will be incremented. @zdz2101 @ddsjoberg

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it simple and always stay at x.x.x.9000. Even in the "R packages" book it is suggested to increment the development version only in exceptional cases:

Increment the development version, e.g. from 9000 to 9001, if you’ve added an important feature and you (or others) need to be able to detect or require the presence of this feature. For example, this can happen when two packages are developing in tandem. This is generally the only reason that we bother to increment the development version. This makes in-development versions special and, in some sense, degenerate. Since we don’t increment the development component with each Git commit, the same package version number is associated with many different states of the package source, in between releases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it will make it a lot easier on us if we don't have to change this every time we merge into main. Seems like this will cause a lot of tedious merge conflicts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem to make more sense, I'll remove the running of the line from the PR template, and it can be added as a standard post-release task

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the admiral family often makes updates that affect downstream dependencies. With that in mind, incrementing the version number will make it easy for these downstream dependencies to ensure the min dev version is installed. @cicdguy created a workflow to increment the dev version number with each PR, and I think it's a good addition to include in admiral


## New Features

## Updates of Existing Functions

- `derive_summary_records()`, `derive_var_merged_summary()`, and `get_summary_records()`
were enhanced such that more than one summary variable can be derived, e.g.,
`AVAL` as the sum and `ADT` as the maximum of the contributing records. (#1792)

## Breaking Changes

- In `derive_summary_records()` and `get_summary_records()` the arguments
`analysis_var` and `summary_fun` were deprecated in favor of `set_values_to`.
(#1792)

- In `derive_var_merged_summary()` the arguments `new_var`, `analysis_var`, and
`summary_fun` were deprecated in favor of `new_vars`. (#1792)

# admiral 0.12.0

Expand Down
94 changes: 62 additions & 32 deletions R/derive_merged.R
Original file line number Diff line number Diff line change
Expand Up @@ -853,34 +853,52 @@ get_not_mapped <- function() {
admiral_environment$nmap
}

#' Merge a Summary Variable
#' Merge Summary Variables
#'
#' @description Merge a summary variable from a dataset to the input dataset.
#'
#' **Note:** This is a wrapper function for the more generic `derive_vars_merged`.
#'
#' @param dataset Input dataset
#'
#' The variables specified by the `by_vars` argument are expected.
#'
#' @param dataset_add Additional dataset
#'
#' The variables specified by the `by_vars` and the `analysis_var` arguments
#' are expected.
#' The variables specified by the `by_vars` and the variables used on the left
#' hand sides of the `new_vars` arguments are expected.
#'
#' @param new_var Variable to add
#'
#' `r lifecycle::badge("deprecated")` Please use `new_vars` instead.
#'
#' The specified variable is added to the input dataset (`dataset`) and set to
#' the summarized values.
#'
#' @param by_vars Grouping variables
#'
#' The values of `analysis_var` are summarized by the specified variables. The
#' summarized values are merged to the input dataset (`dataset`) by the
#' specified by variables.
#' The expressions on the left hand sides of `new_vars` are evaluated by the
#' specified variables. The resulting values are merged to the input dataset
#' (`dataset`) by the specified by variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' (`dataset`) by the specified by variables.
#' (`dataset`) by the specified **by variables**.

I got confused

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The by variables are used twice. I have emphasized both.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added more details to "Details".

#'
#' *Permitted Values*: list of variables created by `exprs()`
#'
#' @param new_vars New variables to add
#'
#' The specified variables are added to the input dataset.
#'
#' A named list of expressions is expected:
#' + LHS refer to a variable.
#' + RHS refers to the values to set to the variable. This can be a string, a
#' symbol, a numeric value, an expression or NA. If summary functions are
#' used, the values are summarized by the variables specified for `by_vars`.
#'
#' For example:
#' ```
#' new_vars = exprs(
#' DOSESUM = sum(AVAL),
#' DOSEMEAN = mean(AVAL)
#' )
#' ```
#'
#' @param filter_add Filter for additional dataset (`dataset_add`)
#'
#' Only observations fulfilling the specified condition are taken into account
Expand All @@ -891,11 +909,15 @@ get_not_mapped <- function() {
#'
#' @param analysis_var Analysis variable
#'
#' `r lifecycle::badge("deprecated")` Please use `new_vars` instead.
#'
#' The values of the specified variable are summarized by the function
#' specified for `summary_fun`.
#'
#' @param summary_fun Summary function
#'
#' `r lifecycle::badge("deprecated")` Please use `new_vars` instead.
#'
#' The specified function that takes as input `analysis_var` and performs the
#' calculation. This can include built-in functions as well as user defined
#' functions, for example `mean` or `function(x) mean(x, na.rm = TRUE)`.
Expand All @@ -906,18 +928,16 @@ get_not_mapped <- function() {
#' 1. The records from the additional dataset (`dataset_add`) are restricted
#' to those matching the `filter_add` condition.
#'
#' 1. The values of the analysis variable (`analysis_var`) are summarized by
#' the summary function (`summary_fun`) for each by group (`by_vars`) in the
#' additional dataset (`dataset_add`).
#' 1. The new variables (`new_vars`) are created for each by group (`by_vars`)
#' in the additional dataset (`dataset_add`).
#'
#' 1. The summarized values are merged to the input dataset as a new variable
#' (`new_var`). For observations without a matching observation in the
#' additional dataset the new variable is set to `NA`. Observations in the
#' additional dataset which have no matching observation in the input dataset
#' are ignored.
#' 1. The new variables are merged to the input dataset. For observations
#' without a matching observation in the additional dataset the new variables
#' are set to `NA`. Observations in the additional dataset which have no
#' matching observation in the input dataset are ignored.
#'
#' @return The output dataset contains all observations and variables of the
#' input dataset and additionally the variable specified for `new_var`.
#' input dataset and additionally the variables specified for `new_vars`.
#'
#' @family der_gen
#' @keywords der_gen
Expand Down Expand Up @@ -947,9 +967,10 @@ get_not_mapped <- function() {
#' adbds,
#' dataset_add = adbds,
#' by_vars = exprs(USUBJID, AVISIT),
#' new_var = MEANVIS,
#' analysis_var = AVAL,
#' summary_fun = function(x) mean(x, na.rm = TRUE)
#' new_vars = exprs(
#' MEANVIS = mean(AVAL, na.rm = TRUE),
#' MAXVIS = max(AVAL, na.rm = TRUE)
#' )
#' )
#'
#' # Add a variable listing the lesion ids at baseline
Expand Down Expand Up @@ -981,46 +1002,55 @@ get_not_mapped <- function() {
#' dataset_add = adtr,
#' by_vars = exprs(USUBJID),
#' filter_add = AVISIT == "BASELINE",
#' new_var = LESIONSBL,
#' analysis_var = LESIONID,
#' summary_fun = function(x) paste(x, collapse = ", ")
#' new_vars = exprs(LESIONSBL = paste(LESIONID, collapse = ", "))
#' )
#'
derive_var_merged_summary <- function(dataset,
dataset_add,
by_vars,
new_vars = NULL,
new_var,
filter_add = NULL,
analysis_var,
summary_fun) {
assert_vars(by_vars)
by_vars_left <- replace_values_by_names(by_vars)
by_vars_right <- chr2vars(paste(vars2chr(by_vars)))
new_var <- assert_symbol(enexpr(new_var))
analysis_var <- assert_symbol(enexpr(analysis_var))
# once new_var is removed new_vars should be mandatory
assert_expr_list(new_vars, named = TRUE, optional = TRUE)
filter_add <-
assert_filter_cond(enexpr(filter_add), optional = TRUE)
assert_s3_class(summary_fun, "function")
assert_data_frame(
dataset,
required_vars = by_vars_left
)
assert_data_frame(
dataset_add,
required_vars = expr_c(by_vars_right, analysis_var)
required_vars = expr_c(by_vars_right, extract_vars(new_vars))
)

if (!missing(new_var) || !missing(analysis_var) || !missing(summary_fun)) {
deprecate_warn(
"1.0.0",
I("derive_var_merged_summary(new_var = , anaylsis_var = , summary_fun = )"),
"derive_var_merged_summary(new_vars = )"
)
new_var <- assert_symbol(enexpr(new_var))
analysis_var <- assert_symbol(enexpr(analysis_var))
assert_s3_class(summary_fun, "function")
new_vars <- exprs(!!new_var := {{ summary_fun }}(!!analysis_var), !!!new_vars)
}

# Summarise the analysis value and merge to the original dataset
derive_vars_merged(
dataset,
dataset_add = get_summary_records(
dataset_add,
by_vars = by_vars_right,
filter = !!filter_add,
analysis_var = !!analysis_var,
summary_fun = summary_fun
),
by_vars = by_vars,
new_vars = exprs(!!new_var := !!analysis_var)
set_values_to = new_vars,
) %>%
select(!!!by_vars_right, names(new_vars)),
by_vars = by_vars
)
}
76 changes: 25 additions & 51 deletions R/derive_param_exposure.R
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,23 @@ derive_param_exposure <- function(dataset,

dtm <- c("ASTDTM", "AENDTM") %in% colnames(dataset)
dt <- c("ASTDT", "AENDT") %in% colnames(dataset)
set_dtm <- NULL
set_dt <- NULL
if (all(dtm)) {
dates <- exprs(ASTDTM, AENDTM)
set_dtm <- exprs(
ASTDTM = min(ASTDTM, na.rm = TRUE),
AENDTM = max(coalesce(AENDTM, ASTDTM), na.rm = TRUE)
)
} else {
dates <- exprs(ASTDT, AENDT)
}
if (all(dt)) {
set_dt <- exprs(
ASTDT = min(ASTDT, na.rm = TRUE),
AENDT = max(coalesce(AENDT, ASTDT), na.rm = TRUE)
)
}

assert_data_frame(dataset,
required_vars = expr_c(by_vars, analysis_var, exprs(PARAMCD), dates)
Expand All @@ -155,57 +167,19 @@ derive_param_exposure <- function(dataset,
assert_character_vector(input_code, values = params_available)
assert_s3_class(summary_fun, "function")

subset_ds <- dataset %>%
filter_if(filter)

add_data <- subset_ds %>%
get_summary_records(
by_vars = by_vars,
filter = PARAMCD == !!input_code,
analysis_var = !!analysis_var,
summary_fun = summary_fun,
set_values_to = set_values_to
)

# add the dates for the derived parameters
tmp_start <- get_new_tmp_var(dataset)
tmp_end <- get_new_tmp_var(dataset)
if (all(dtm)) {
dates <- subset_ds %>%
group_by(!!!by_vars) %>%
summarise(
!!tmp_start := min(ASTDTM, na.rm = TRUE),
!!tmp_end := max(coalesce(AENDTM, ASTDTM), na.rm = TRUE)
) %>%
ungroup()
expo_data <- add_data %>%
derive_vars_merged(dataset_add = dates, by_vars = by_vars) %>%
mutate(
ASTDTM = !!tmp_start,
AENDTM = !!tmp_end
) %>%
remove_tmp_vars()

if (all(dt)) {
expo_data <- expo_data %>%
mutate(ASTDT = date(ASTDTM), AENDT = date(AENDTM))
}
} else {
dates <- subset_ds %>%
group_by(!!!by_vars) %>%
summarise(
!!tmp_start := min(ASTDT, na.rm = TRUE),
!!tmp_end := max(coalesce(AENDT, ASTDT), na.rm = TRUE)
) %>%
ungroup()
expo_data <- add_data %>%
derive_vars_merged(dataset_add = dates, by_vars = by_vars) %>%
mutate(
ASTDT = !!tmp_start,
AENDT = !!tmp_end
) %>%
remove_tmp_vars()
if (is.null(filter)) {
filter <- TRUE
}

bind_rows(dataset, expo_data)
derive_summary_records(
dataset,
by_vars = by_vars,
filter = PARAMCD == !!input_code & !!filter,
set_values_to = exprs(
!!analysis_var := {{ summary_fun }}(!!analysis_var),
!!!set_dtm,
!!!set_dt,
!!!set_values_to
)
)
}
Loading
Loading