-
Notifications
You must be signed in to change notification settings - Fork 24
first pass at postprocessing proof-of-concept #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* move container to Suggests * document extraction generic * document `workflow(postprocessor)` argument
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
R/fit.R
Outdated
#' @rdname workflows-internals | ||
#' @export | ||
.fit_post <- function(workflow, data) { | ||
action_post <- workflow[["post"]][["actions"]][["container"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use extract_postprocessor()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
I don't see any issues but I wonder if we should start an issue to make sure that we have the right plumbing and arguments for when we get around to implementing postprocessors for survival models. @hfrick should have a good eye for that here and in general.
ec0effa surfaces an important point; removing/updating a postprocessor from an otherwise trained workflow need not remove the preprocessor and model fits, as they won't be affected by the removal of the postprocessor. This introduces the possibility of a "partially trained" workflow, where a workflow with trained preprocessor and model but untrained postprocessor should be able to fit without issue. |
to align with the analogous argument in `extract_recipe()`; extract the fitted object if it's available, otherwise extract the specification. also updates the slot name from `post` to `fit` to elicit that the slot contains the trained object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so here are my thoughts on what plumbing post-processing for survival analysis would need.
Basic assumptions
- We might tailor both predictions of survival time and predictions of survival probability.
- If we tailor survival probability, we will do so at a specific single time point (hello
eval_time
our old friend).
I have not yet validated either assumption. Max, do you have a sense of whether they are valid?
For the implementation, this implies
- How/where do we specify the predictions?
- How/where do we specify
eval_time
?
Specifying the predictions
tailor()
could take survival time.pred_time
via theestimate
argument -- the documentation already suggest that.tailor()
could take survival probability in the tibble form of.pred
(containing.eval_time
and.pred_survival
) via theprobabilities
argument.
Specifying eval_time
- [tailor]
tailor()
would droptime
(the documentation for that as "the predicted event time" contradicts the documentation forestimate
listing.pred_time
) and instead takeeval_time
. The alternative would be to derive it from.pred
within a given postprocessingoperation
and default to the first value if we need a single one. Either way,time
would disappear. - [workflows] We currently can pass
eval_time
topredict()
andaugment()
methods for workflows but since we currently don't need it earlier, there are not arguments for eval time to thefit()
method for workflows or the specification viaworkflow()
.- If the post-processing operation doesn't need estimation, only
predict.workflow()
having aneval_time
argument should be fine. - If it does need estimation, it could come from the
tailor()
specification.
- If the post-processing operation doesn't need estimation, only
- [tune] The tuning functions have an
eval_time
argument which is required for dynamic and integrated survival metrics. If we need a single eval time to optimize for, we use the first one.- If a user includes a tailor in the workflow to be tuned, it would be nice to not make them specify
eval_time
twice: when making thetailor()
and when calling the tuning function. - We could pass it through workflows (and make it gain an argument) or make a function to update the tailor. I lean towards updating the tailor.
- If a user includes a tailor in the workflow to be tuned, it would be nice to not make them specify
check_conflicts.action_tailor <- function(action, x, ..., call = caller_env()) { | ||
post <- x$post | ||
|
||
invisible(action) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this function? We are not doing anything with post so we could just fall back to the default method. We also don't envision different types of post-processors, analogous to how we have different preprocessors (recipe, formula, ...).
estimate = tidyselect::any_of(c(".pred", ".pred_class")), | ||
probabilities = c( | ||
tidyselect::contains(".pred_"), | ||
-tidyselect::matches("^\\.pred$|^\\.pred_class$") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need changing if we pass predictions of survival time and survival probability to these arguments. Main catch: the probabilities are in .pred
. (The survival times are in .pred_time
.)
After chatting with Max:
|
seeing some sort of strange division behavior for `prop` in tune
With an eye for reducing |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Mostly just pattern-matches
recipe
andmodel_fit
implementations and hooks into the existing post-processing infrastructure. See tests for up-to-date examples.Previous PR description, outdated
A fast and loose proof-of-concept for integrating a postprocessing container.
Created on 2024-04-23 with reprex v2.1.0