-
-
Notifications
You must be signed in to change notification settings - Fork 3
WIP: POC Redesign extraction #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This would specify "factor columns from |
Yes, I'm glad it is easy to understand.
Not sure I fully understand your question. Variables can also be manually set, for example
The only limitation is that the function provided should return a logical vector. In terms of implementation, the function is applied to the names and to the values, precisely to accommodate cases like this. |
I don't get this one. I would think it would return columns where the values are in all caps. |
The same function is applied to column names and column values? 🤔 |
It could mean both things imho. But here it is specifying variables, not values. For filtering by value there is td <- within(teal.data::teal_data(), {
df <- data.frame(A = as.factor(letters[1:5]), Ab = letters[1:5])
m <- matrix()
})
spec <- datasets("df") & variables(function(x){x==toupper(x)})
resolver(spec, td) Still selects A as the name is in capital letters
Yes, if there is a single argument that should accept |
Thanks, it does explain things a bit. |
I updated the branch to make it easy to reflect updates on selections from the possible choices. Besides some internal functions, it now exports a new key function Here is a full example with a shiny module and teal: library("teal")
library("shiny")
# Initialize the teal app
mod <- function(label, x, y) {
module(
ui = function(id, x, y) {
ns <- NS(id)
div(
module_input_ui(ns("x"), "x", x),
module_input_ui(ns("y"), "y", y),
shiny::tagList(
shiny::textOutput(ns("text"))
)
)
},
server = function(id, data, x, y) {
moduleServer(id, function(input, output, session) {
x_in <- module_input_server("x", x, data)
y_in <- module_input_server("y", y, data)
output$text <- shiny::renderText({
req(x_in(), y_in())
l <- lapply(
list(x = x_in, y = y_in),
function(sel) {
if (sum(lengths(sel()))) {
paste0(
"Object: ", sel()$datasets, "\nVariables: ",
paste0(sel()$variables, collapse = ", ")
)
} else {
"No info??"
}
}
)
unlist(l)
})
})
},
ui_args = list(x = x, y = y),
server_args = list(x = x, y = y)
)
} Here I create three different specifications, you can pass them to the app and they will work: iris_numeric <- datasets("IRIS") & variables(is.numeric)
mtcars_char <- datasets("MTCARS") & variables(is.character)
iris_numeric_or_mtcars_char <- iris_numeric | mtcars_char
app <- init(
data = teal.data::teal_data(IRIS = iris, MTCARS = mtcars),
modules = modules(
mod(x = datasets("IRIS") & variables(is.factor),
y = datasets("MTCARS") & variables(is.numeric))
)
)
shinyApp(app$ui, app$server) |
…sengineering/teal.transform into redesign_extraction@main
R/resolver.R
Outdated
eval_type_select <- function(type, data) { | ||
stopifnot(is.character(type$names)) | ||
|
||
orig_select <- orig(type$select) | ||
if (length(orig_select) == 1L) { | ||
orig_select <- orig_select[[1L]] | ||
} | ||
|
||
names <- seq_along(type$names) | ||
names(names) <- type$names | ||
select_data <- selector(data, type$select) | ||
|
||
# Keep only those that were already selected | ||
new_select <- intersect(unique(names(c(select_data))), unorig(type$names)) | ||
if (is.null(new_select)) { | ||
stop("No ", is(type), " selection is possible.") | ||
} | ||
attr(new_select, "original") <- orig_select | ||
|
||
type$select <- new_select | ||
|
||
type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select
shouldn't be a "function" of data but of choices. What if we reduce the complexity to this (plus some additional warnings/assertions within this if)?
eval_type_select <- function(type, data) { | |
stopifnot(is.character(type$names)) | |
orig_select <- orig(type$select) | |
if (length(orig_select) == 1L) { | |
orig_select <- orig_select[[1L]] | |
} | |
names <- seq_along(type$names) | |
names(names) <- type$names | |
select_data <- selector(data, type$select) | |
# Keep only those that were already selected | |
new_select <- intersect(unique(names(c(select_data))), unorig(type$names)) | |
if (is.null(new_select)) { | |
stop("No ", is(type), " selection is possible.") | |
} | |
attr(new_select, "original") <- orig_select | |
type$select <- new_select | |
type | |
eval_type_select <- function(type) { | |
stopifnot(is.character(type$names)) | |
orig_select <- orig(type$select) | |
new_select <- if (!checkmate::test_integerish(orig_select)) { | |
type$names[orig_select] | |
} else { | |
intersect(orig_select, type$names) | |
} | |
attr(new_select, "original") <- orig_select | |
type$select <- new_select | |
type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this will simplify the complexity if we only want to select by name. But I thought we wanted to be able to select by other properties. For example:
spec <- c(datasets(where(is.data.frame)), variables(where(is.character), where(function(x){n_distinct(x) > 5})))
td <- within(teal.data::teal_data(), {df <- data.frame(A = rep_len("A", 6), B = letters[1:6])})
library("dplyr")
r <- resolver(spec, td)
With your patch we can't resolve to select B and it wouldn't work by default. I think this extra complexity is worth it the flexibility (it is pretty similar to the eval_type_names) and this is in line of Pawel's comment to directly give access to the data but extended to the selection. But it's your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@llrs-roche and @pawelru please convince me why this should work. And explain what you'd do if intersect(select, x)
results in character(0)
.
variables(
x = where(is.character), # possible choices
select = where(function(x){n_distinct(x) > 5}) # initial selection
)
I agree this will simplify the complexity if we only want to select by name
Not only by name. IMO select
should be a "function" of choices. And yes, function of choices might produce invalid result as well. For example if x = letters[1:5], select = head(10)
it won't give 10-selections, but it is more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please convince me why this should work. And explain what you'd do if
intersect(select, x)
results incharacter(0)
.
Mmh, this could either be an error or just leave the user to pick one of the choices.
Not only by name. IMO select should be a "function" of choices.
Ok, but then we need the tidyselect help to evaluate the function, which is what I tried to do here. I have misunderstood the intention of the patch. Sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue a debate ;]
Mmh, this could either be an error ...
It results in app-user facing punished with error made by an app-developer lost in too complex api.
... or just leave the user to pick one of the choices.
Empty initial selection could be also an error (depending on module's validation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It results in app-user facing punished with error made by an app-developer lost in too complex api.
Maybe the app-developer would see the error before sharing/deploying it. But it's true we shouldn't try to guide them better than this.
Empty initial selection could be also an error (depending on module's validation).
This could be handled at the module level instead of at the extraction level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the app-developer would see the error before sharing/deploying it. But it's true we shouldn't try to guide them better than this.
No easy to address. This is run and resolved inside of the teal_module's server so it takes a while before extract meets data.
This could be handled at the module level instead of at the extraction level.
Please elaborate more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the app-developer would see the error before sharing/deploying it. But it's true we shouldn't try to guide them better than this.
No easy to address. This is run and resolved inside of the teal_module's server so it takes a while before extract meets data.
Indeed
This could be handled at the module level instead of at the extraction level.
Please elaborate more
Some modules will only allow to select one while others will allow to select multiple variables. So I think this kind of validation shouldn't be on the extraction feature. Between not selecting something and selecting something is more debatable in my opinion. But if we already need to include validation for one or more than one selection we can add validation between 0 and 1 at the same time.
R/types.R
Outdated
type_helper <- function(x, select, type) { | ||
out <- list(names = x, select = select) | ||
class(out) <- c(type, "type", "list") | ||
attr(out$names, "original") <- x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names <- x
. Please stay consistent and use either names
or x
everywhere. IMO x
is not informative at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use names
(or maybe options
or choices
to make it easier for users to move to this API). I was thinking on extending this idea of using more accurate names to other functions and methods now that the feature is more complete. I'll check on what it was added and see what makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some propositions: choices/selected
, choices/active
, superset/subset
, X/x
, possible/chosen
, from/select
, pick/from
, all/picked
...
Short summary about the discussion we had regarding selection based on the possible choices and how to select them. We decided to keep the tidyselect on the selection but only pass the possible choices. A small example for the extraction of the factors variables but selecting those without NAs: devtools::load_all(".")
td <- within(teal.data::teal_data(), {
i <- iris
p <- penguins
m <- diag(5)
})
r <- resolver(c(datasets(where(is.data.frame), "p"),
variables(where(is.factor),
!where(anyNA))),
td)
r$variables$select
## "species" "island" We don't need to repeat the initial code for selecting the variables and tidyselection makes it flexible for multiple use cases (for example avoid plotting with NAs or outliers even if they are valid numeric or categorical values). |
Linked to insightsengineering/NEST-roadmap#36
Pull Request
Redesign extraction
This branch/PR is a POC of specification of user input to be extracted.
See tests added about how it works (currently most if not all of them should work), but simple example of specification:
Features:
Examples
Resolve input to be used by module developer
The
module_input_ui
andmodule_input_server
help the developer to get the required data.Merge arbitrary data
The
merge_module_srv
accepts the inputs and merge arbitrary data processed by themodule_input_server
, this is done thanks toextract_input
the responsible to obtain the requested data fromdata()
.MAE
TODO
Data extraction and preparation design.
UI update based on partial resolution.
Check it is simple for app developers.
Use tidyselect instead of own operators and resolution.
Update API to not rely on operators on this package (rely only on those on tidyselect)
Test extraction on non data.frame objects: MAE objects/matrices, lists...
Check missing corners: test required changes on current modules:
tmg: tm_missing_data, tm_g_distribution, and tm_data_table;
tmc: tm_a_mmrm and tm_t_events
Only up to the server body, check the defaults,
8 hours each module, mmrm = 10 hours.
Verify value specification: Some modules request variable A vs other variables or a numeric input.
20 hours
Verify bookmarking works
Verify it works on multiple data merges from multiple sources (cfr: [Bug]: Forced merge precludes using modules without joinin keys #258 )
Check it is simple for module developers (less important).
UI validation: removing unnecessary input UI or gray out when no input is possible (just one choice). Example: selection that allows to select data.frames if they have column A, there is no need to select a variable.
Documentation/help
Check transition/notify dependencies/users
Update module packages to use that feature
Set sensible defaults to specifications
What should be the constructor of the function shuold have as a default for the extract spec (i.e. is it data.frame? take all numeric columns? Only select the first one (with all available options)?
Each module make have different requirements.