What else do we need for postprocessing?

I have a specific argument to make regarding two potential adjustments. However, it would also be good to get a broader set of opinions from others. Maybe @ryantibs and/or @dajmcdon have thoughts. 

My thought: three things that we might consider being optional arguments to the tailor (or an individual adjustment):

- The training data
- The [hardhat mold](https://hardhat.tidymodels.org/reference/mold.html)
- The fitted workflow

Why? Two similar calibration tools prompted these ideas. To demonstrate, let's look at what Cubist does to postprocess. This is discussed and illustrated in [this blog post](https://rviews.rstudio.com/2020/05/21/modern-rule-based-models/). The other is discussed in #67 and has requirements similar to those of the Cubist adjustment. 

After the supervised model predicts, Cubist finds its nearest neighbors in the training set. It adjusts a prediction based on the distances to the neighbors and the training set predictions for the neighbors. 

We don't have to use the training set; it could conceivably be a calibration set. To generalize, I'll call it the _reference_ data set. 

To do this with a tailor, we would already have the current prediction from the model (which may have already been adjusted by other postprocessors) and perhaps the reference set predictions if we are properly prepared. 

To find the neighbors, we will need to process both the reference set and the new predictors in the same way as the data was given to the supervised model. For this, we'd need the mold from the workflow. 

When making the tailor, we could specify the number of neighbors, pass the reference data set, and the mold. We could require the predictions for the reference set to be in the reference set data frame, avoiding the workflow need. 

The presence of the workflow is a little dangerous; it would likely include the tailor. Apart from the infinite recursion of the workflow being added to a workflow that contains the tailor adjustment in the workflow, we would want to avoid people accidentally misapplying the workflow. Let's **exclude the workflow** from being an input into a tailor adjustment but keep the idea of adding a data set of predictors and/or the workflow mold. 

Where would we specify the mold or data? In the main `tailor()` call or to the adjustments? The mold is independent of the data set and would not vary from adjustment to adjustment, so an option to `tailor()` would be my suggestion. 

The data set is probably in the adjustments. Unfortunately, multiple data sets could be included depending on what is computed after the model prediction (relevant is #4). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What else do we need for postprocessing? #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What else do we need for postprocessing? #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions