Error in step_pls using 'options = list(scale = FALSE)'

## The problem

According to the documentation for step_pls (https://recipes.tidymodels.org/reference/step_pls.html), it is possible to specify if mixOmics::pls(), or other similar functions from the same package, should not perform scaling of each predictor for its standard deviation by specify the argument 'option = list(scale = FALSE)': this could be useful in case someone wants to scale the predictors for quantities different than the sd or even avoid feature scaling completely (which is quite commmon for spectroscopic type of data).
However, when I do it like this, I am not able to prep the recipe correctly, since an error is returned, whereas the recipe with the standard PLS, i.e. with feature scaling, works smoothly.

## Reproducible example

```r
# Example taken from: https://recipes.tidymodels.org/reference/step_pls.html.

library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.3.2
#> Warning: package 'readr' was built under R version 4.3.2
#> Warning: package 'purrr' was built under R version 4.3.3
#> Warning: package 'dplyr' was built under R version 4.3.2
#> Warning: package 'lubridate' was built under R version 4.3.3
library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.3.3
#> Warning: package 'dials' was built under R version 4.3.3
#> Warning: package 'modeldata' was built under R version 4.3.3
#> Warning: package 'parsnip' was built under R version 4.3.3
#> Warning: package 'tune' was built under R version 4.3.3
#> Warning: package 'yardstick' was built under R version 4.3.3
# NOTE: the package mixOmics needs to be installed.


# Import the dataset and divide in training set and test set.
data(biomass, package = "modeldata")

biom_tr <-
    biomass |>
    filter(dataset == "Training") |>
    select(-dataset, -sample)
biom_te <-
    biomass |>
    filter(dataset == "Testing") |>
    select(-dataset, -sample, -HHV)


# Standard PLS recipe (with both mean centering and scaling for standard deviation)
# This one works correctly
recipe_pls <-
    recipe(HHV ~ ., data = biom_tr) |>
    step_pls(all_numeric_predictors(), outcome = HHV, num_comp = 3) |>
    prep()


# PLS recipe without scaling (only mean centering)
# This one does return the error
recipe_pls_CENTERING <-
    recipe(HHV ~ ., data = biom_tr) |>
    step_pls(all_numeric_predictors(), outcome = HHV, num_comp = 3,
             options = list(scale = FALSE)) |>
    prep()
#> Warning in max(cumDim[cumDim <= lstats]): no non-missing arguments to max;
#> returning -Inf
#> Error in `step_pls()`:
#> Caused by error in `array()`:
#> ! 'data' must be of a vector type, was 'NULL'

# Created on 2025-06-03 with [reprex v2.1.1](https://reprex.tidyverse.org/)
```

## Where and why the error occurs
By running rlang::last_trace(), it seems that the error occurs inside recipes:::pls_project, where the 'sweep' fuction is applied to divide each predictor by the sd stored in the recipe (line 321 of the file 'pls.R'), but since I requested step_pls to NOT apply scaling, such object does not exist, so a NULL object is used instead and the error is returned.

## What I suggest
To avoid the error, I would suggest to replace lines 320 - 321 of the file 'pls.R' with the ones I report below, or something similar. This way, the function 'pls_project' first checks if object$sd, produced by recipes:::butcher_pls, is NULL (corresponding to the situation where no scaling was requested) or not, and produces a suitable object, which I simply called "scaling_vector", to be used by the function 'scale'. Then, the function 'scale' is applied to mean-center and, eventually, scale for the sd the predictors correctly. This way, you also avoid to apply the 'sweep' function twice in a row.
With this change, the second recipe in the previous example can be prepped as well, with no problems.

```r
if (is.null(object$sd)) {
  scaling_vector <- FALSE
} else {
  scaling_vector <- object$sd
}

z <- scale(x, center = object$mu, scale = scaling_vector)
```

## A small final note
Sorry if I made some mistake: this is the first time ever that I report an issue on GitHub for a function. But in the future I am going to use this recipe with some custom scaling steps and classification models, and during some trials I discovered this, so I wanted to inform you about that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in step_pls using 'options = list(scale = FALSE)' #1512

The problem

Reproducible example

Where and why the error occurs

What I suggest

A small final note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in step_pls using 'options = list(scale = FALSE)' #1512

Description

The problem

Reproducible example

Where and why the error occurs

What I suggest

A small final note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions