-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #1966 address derive_vars_joined bugs #2016
Changes from 6 commits
2fece95
651168f
dceced5
0507afd
99b3150
e72dff5
c70e684
af81c6f
2ec4be5
9147ee7
7fe89d9
8c651dc
bb9a17e
c67c8b4
05ea4ad
50701fd
a225fdf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -349,7 +349,7 @@ derive_vars_joined <- function(dataset, | |
|
||
filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE) | ||
filter_join <- assert_filter_cond(enexpr(filter_join), optional = TRUE) | ||
|
||
original_new_vars <- new_vars | ||
if (is.null(new_vars)) { | ||
new_vars <- chr2vars(colnames(dataset_add)) | ||
} | ||
|
@@ -399,17 +399,18 @@ derive_vars_joined <- function(dataset, | |
suffix = ".join" | ||
), | ||
mode = mode, | ||
check_type = check_type | ||
check_type = "none" | ||
) | ||
} | ||
|
||
# merge new variables to the input dataset and rename them | ||
data %>% | ||
data_final <- data %>% | ||
derive_vars_merged( | ||
dataset_add = data_return, | ||
by_vars = exprs(!!!by_vars_left, !!tmp_obs_nr), | ||
new_vars = add_suffix_to_vars(new_vars, vars = common_vars, suffix = ".join"), | ||
missing_values = missing_values, | ||
check_type = check_type, | ||
duplicate_msg = paste( | ||
paste( | ||
"After applying `filter_join` the joined dataset contains more", | ||
|
@@ -423,4 +424,10 @@ derive_vars_joined <- function(dataset, | |
) | ||
) %>% | ||
remove_tmp_vars() | ||
|
||
if (is.null(original_new_vars)) { | ||
data_final <- data_final %>% | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bundfussr this one addresses the NULL new_vars issue, don't think it's the most "elegant" but should work...? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is not in line with the documentation and intended behavior. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So in the case of all variables already in the input dataset, the desired result would be an error and no-return object at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, or more precise, if any of the variables specified for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see updated |
||
select(-ends_with("join")) | ||
} | ||
return(data_final) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bundfussr This modification addresses the get_duplicates reprex you cited in the issue, though I'm not sure this is the "perfect" solution to it. There are 3 places where
check_type
for the duplicates are being checked, up at line 364 (which is set to "none"), here in line 402 (which is passed through the user in the function call), and passively throughderive_vars_merged()
in line 413 (which is defaulted to "warning"). I think(?) the intention was to allow duplicates (& as a result suppress messaging) up until thederive_vars_merged()
call in the function body where there is a specificduplciate_msg
to highlight the "final" duplicates, which is why I added the new line.May be important to note, that in our vignette
generic.Rmd
, we havecheck_type = "none"
in one of the examples to suppress what would've been a similar warning you have in your reprex, because the default forcheck_type
right now inderive_vars_joined()
is"warning"
. But with the proposed fix, you can actually remove this line and let it be the default.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should not be changed. Otherwise, no message is issued if the selected observation is not unique. Consider for example
with the datasets from the second example in the issue. Here a warning should be issued because the first TBILI assessment at or after the AST assessment is not unique for
ASEQ %in% c(2, 3)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If thats the case, is the example in the
generic.Rmd
a little misleading then? If it weren't toggled tocheck_type = "none"
there would be a warning.I also think I'm a little confused how
order
is handled in general:In general,
!!!chr2vars(names(order))
will not select columns unless user provides a named value so thecommon_vars
statement a little further down can be inconsistent. Should it be swapped out with (?):replace_values_by_names
extract_vars()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying
check_type = "none"
in the example makes sense because only the maximum severity is added as new variable. If there are multiple observations with the maximum severity, it doesn't matter which one is used.Regarding
order
, we need to more comments to the code. The purpose of themutate()
call is to create variables for named expressions inorder
. Then these can be used injoin_vars
or the filter condition.You are right that
!!!chr2vars(names(order))
is not correct. We may lose the variables used in the expressions oforder
. I think it should be replaced by!!!replace_values_by_names(extract_vars(order))
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that did the trick! Now retaining the original
check_type = check_type
code and using the!!!replace_values_by_names(extract_vars(order))
fixes the reprex you provided in the issue and still provides the appropriate warning message you indicated 3 replies above