Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #1966 address derive_vars_joined bugs #2016

Merged
merged 17 commits into from
Aug 1, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions R/derive_joined.R
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ derive_vars_joined <- function(dataset,

filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE)
filter_join <- assert_filter_cond(enexpr(filter_join), optional = TRUE)

original_new_vars <- new_vars
if (is.null(new_vars)) {
new_vars <- chr2vars(colnames(dataset_add))
}
Expand Down Expand Up @@ -399,17 +399,18 @@ derive_vars_joined <- function(dataset,
suffix = ".join"
),
mode = mode,
check_type = check_type
check_type = "none"
Copy link
Collaborator Author

@zdz2101 zdz2101 Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bundfussr This modification addresses the get_duplicates reprex you cited in the issue, though I'm not sure this is the "perfect" solution to it. There are 3 places where check_type for the duplicates are being checked, up at line 364 (which is set to "none"), here in line 402 (which is passed through the user in the function call), and passively through derive_vars_merged() in line 413 (which is defaulted to "warning"). I think(?) the intention was to allow duplicates (& as a result suppress messaging) up until the derive_vars_merged() call in the function body where there is a specific duplciate_msg to highlight the "final" duplicates, which is why I added the new line.

May be important to note, that in our vignette generic.Rmd, we have check_type = "none" in one of the examples to suppress what would've been a similar warning you have in your reprex, because the default for check_type right now in derive_vars_joined() is "warning". But with the proposed fix, you can actually remove this line and let it be the default.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should not be changed. Otherwise, no message is issued if the selected observation is not unique. Consider for example

derive_vars_joined(
  adlb_ast,
  dataset_add = adlb_tbili_pbl,
  by_vars = exprs(STUDYID, USUBJID),
  order = exprs(ADT),
  new_vars = exprs(TBILI_ADT = ADT, TBILI_SEQ = ASEQ),
  filter_join = ADT <= ADT.join,
  mode = "first"
)

with the datasets from the second example in the issue. Here a warning should be issued because the first TBILI assessment at or after the AST assessment is not unique for ASEQ %in% c(2, 3).

Copy link
Collaborator Author

@zdz2101 zdz2101 Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If thats the case, is the example in the generic.Rmd a little misleading then? If it weren't toggled to check_type = "none" there would be a warning.

I also think I'm a little confused how order is handled in general:

image

In general, !!!chr2vars(names(order)) will not select columns unless user provides a named value so the common_vars statement a little further down can be inconsistent. Should it be swapped out with (?):

  • replace_values_by_names
  • extract_vars()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying check_type = "none" in the example makes sense because only the maximum severity is added as new variable. If there are multiple observations with the maximum severity, it doesn't matter which one is used.

Regarding order, we need to more comments to the code. The purpose of the mutate() call is to create variables for named expressions in order. Then these can be used in join_vars or the filter condition.
You are right that !!!chr2vars(names(order)) is not correct. We may lose the variables used in the expressions of order. I think it should be replaced by !!!replace_values_by_names(extract_vars(order)).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that did the trick! Now retaining the original check_type = check_type code and using the !!!replace_values_by_names(extract_vars(order)) fixes the reprex you provided in the issue and still provides the appropriate warning message you indicated 3 replies above

)
}

# merge new variables to the input dataset and rename them
data %>%
data_final <- data %>%
derive_vars_merged(
dataset_add = data_return,
by_vars = exprs(!!!by_vars_left, !!tmp_obs_nr),
new_vars = add_suffix_to_vars(new_vars, vars = common_vars, suffix = ".join"),
missing_values = missing_values,
check_type = check_type,
duplicate_msg = paste(
paste(
"After applying `filter_join` the joined dataset contains more",
Expand All @@ -423,4 +424,10 @@ derive_vars_joined <- function(dataset,
)
) %>%
remove_tmp_vars()

if (is.null(original_new_vars)) {
data_final <- data_final %>%
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bundfussr this one addresses the NULL new_vars issue, don't think it's the most "elegant" but should work...?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not in line with the documentation and intended behavior. If new_vars = NULL is specified, all variables from dataset_add should be added. However, in the example from the issue no variable is added. Of course, we can not add all variables because they are already in the input dataset. Thus an error should be issued.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the case of all variables already in the input dataset, the desired result would be an error and no-return object at all?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or more precise, if any of the variables specified for new_vars is already in the input dataset, the function should stop with an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see updated if (is.null(new_vars)) at line 353 that implements suggestion and updated test with the corresponding messaging

select(-ends_with("join"))
}
return(data_final)
}
Loading