Make `epi[x]_slide` named data-masking expressions output tibble column bundles

Our named data masking expressions don't give the same behavior as `dplyr::mutate`&co when the named expression is a tibble; `epi[x]_slide` will make separate name-prefixed columns by default, while `mutate` will create a tibble-type column (column bundle):
``` r
library(dplyr, warn.conflicts=FALSE)
library(epiprocess, warn.conflicts=FALSE)
invisible(withr::local_rng_version("3.5.0"))
invisible(withr::local_seed(295595251L))

edf = new_epi_df(tibble(geo_value="geo1",
                        time_value = as.Date("2020-01-01") + 0:19,
                        x1 = runif(20L),
                        x2 = 0.2*runif(20L),
                        y = 2*x1 + 3*x2 + 10 + rnorm(20L)))

edf %>%
  epi_slide(before = 100L,
            terms =
              predict(lm(y ~ x1 + x2,
                         # not doing a real train-test split
                         tibble(x1, x2, y)),
                      tibble(x1, x2, y) %>% tail(n=1L),
                      type="terms") %>%
              as_tibble() %>%
              mutate(constant = attr(., "constant"))
            )
#> Warning in predict.lm(lm(y ~ x1 + x2, tibble(x1, x2, y)), tibble(x1, x2, :
#> prediction from a rank-deficient fit may be misleading

#> Warning in predict.lm(lm(y ~ x1 + x2, tibble(x1, x2, y)), tibble(x1, x2, :
#> prediction from a rank-deficient fit may be misleading
#> An `epi_df` object, 20 x 7 with metadata:
#> * geo_type  = custom
#> * time_type = day
#> * as_of     = 2023-03-28 17:38:41
#> 
#> # A tibble: 20 × 7
#>    geo_value time_value      x1     x2     y terms_x1 terms_x2
#>  * <chr>     <date>       <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#>  1 geo1      2020-01-01 0.290   0.0803 10.1    0       0      
#>  2 geo1      2020-01-02 0.276   0.0372 10.5    0.183   0      
#>  3 geo1      2020-01-03 0.00323 0.127  11.1    0.889  -0.315  
#>  4 geo1      2020-01-04 0.236   0.0382 11.9   -0.279   0.752  
#>  5 geo1      2020-01-05 0.786   0.0676 12.5    0.939  -0.00634
#>  6 geo1      2020-01-06 0.518   0.180  11.0    0.270  -0.261  
#>  7 geo1      2020-01-07 0.384   0.0372 11.4    0.0457  0.132  
#>  8 geo1      2020-01-08 0.144   0.113  10.6   -0.315  -0.0889 
#>  9 geo1      2020-01-09 0.640   0.0746 12.5    0.589   0.0317 
#> 10 geo1      2020-01-10 0.0360  0.191  12.6   -0.406   0.347  
#> 11 geo1      2020-01-11 0.0532  0.118   9.80  -0.474   0.0719 
#> 12 geo1      2020-01-12 0.132   0.108   9.66  -0.350   0.0347 
#> 13 geo1      2020-01-13 0.950   0.0669 13.2    1.58   -0.104  
#> 14 geo1      2020-01-14 0.862   0.0714 13.3    1.39   -0.0845 
#> 15 geo1      2020-01-15 0.00865 0.190   9.59  -1.03    0.134  
#> 16 geo1      2020-01-16 0.527   0.140  11.3    0.460   0.0229 
#> 17 geo1      2020-01-17 0.784   0.0246 10.9    1.03   -0.174  
#> 18 geo1      2020-01-18 0.621   0.194   9.45   0.413  -0.318  
#> 19 geo1      2020-01-19 0.356   0.189   8.80  -0.0775 -0.517  
#> 20 geo1      2020-01-20 0.661   0.0807  9.87   0.376   0.156

edf %>%
  mutate(terms =
           predict(lm(y ~ x1 + x2,
                      tibble(x1, x2, y)),
                   # everything in-sample
                   tibble(x1, x2, y),
                   type="terms") %>%
           as_tibble() %>%
           mutate(constant = attr(., "constant"))
         )
#> An `epi_df` object, 20 x 6 with metadata:
#> * geo_type  = custom
#> * time_type = day
#> * as_of     = 2023-03-28 17:38:41
#> 
#> # A tibble: 20 × 6
#>    geo_value time_value      x1     x2     y terms$x1      $x2
#>  * <chr>     <date>       <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#>  1 geo1      2020-01-01 0.290   0.0803 10.1   -0.187   0.158  
#>  2 geo1      2020-01-02 0.276   0.0372 10.5   -0.208   0.420  
#>  3 geo1      2020-01-03 0.00323 0.127  11.1   -0.622  -0.123  
#>  4 geo1      2020-01-04 0.236   0.0382 11.9   -0.269   0.414  
#>  5 geo1      2020-01-05 0.786   0.0676 12.5    0.565   0.236  
#>  6 geo1      2020-01-06 0.518   0.180  11.0    0.159  -0.448  
#>  7 geo1      2020-01-07 0.384   0.0372 11.4   -0.0440  0.421  
#>  8 geo1      2020-01-08 0.144   0.113  10.6   -0.408  -0.0379 
#>  9 geo1      2020-01-09 0.640   0.0746 12.5    0.343   0.193  
#> 10 geo1      2020-01-10 0.0360  0.191  12.6   -0.572  -0.514  
#> 11 geo1      2020-01-11 0.0532  0.118   9.80  -0.546  -0.0691 
#> 12 geo1      2020-01-12 0.132   0.108   9.66  -0.427  -0.00755
#> 13 geo1      2020-01-13 0.950   0.0669 13.2    0.813   0.240  
#> 14 geo1      2020-01-14 0.862   0.0714 13.3    0.680   0.213  
#> 15 geo1      2020-01-15 0.00865 0.190   9.59  -0.613  -0.507  
#> 16 geo1      2020-01-16 0.527   0.140  11.3    0.172  -0.205  
#> 17 geo1      2020-01-17 0.784   0.0246 10.9    0.561   0.497  
#> 18 geo1      2020-01-18 0.621   0.194   9.45   0.314  -0.535  
#> 19 geo1      2020-01-19 0.356   0.189   8.80  -0.0868 -0.502  
#> 20 geo1      2020-01-20 0.661   0.0807  9.87   0.376   0.156
```

<sup>Created on 2023-03-28 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

We should probably try to match dplyr here.

See also #255 regarding unnamed data masking expressions yielding tibbles, which we don't allow and which dplyr turns into separate columns.

We may not be in this situation very often, since we don't have `cur_data()` etc. implemented, so we're going to reach for the function/formula form in these situations first.  So marking this low priority.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `epi[x]_slide` named data-masking expressions output tibble column bundles #293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make epi[x]_slide named data-masking expressions output tibble column bundles #293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `epi[x]_slide` named data-masking expressions output tibble column bundles #293