Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods for quantile_pred() change interpretation of missingness, add to errors with missingness #449

Open
brookslogan opened this issue Mar 20, 2025 · 7 comments
Assignees

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Mar 20, 2025

E.g., the vec_proxy_equal() implementation seems to be responsible for these changes in behavior demonstrated below. The errors are due to an upstream hardhat bug in as_tibble.quantile_pred(), but the changes in behavior are from epipredict replacing the default vec_proxy_equal() behavior. This is breaking ability to store forecasts in archives in epiprocess; see cmu-delphi/epiprocess#631.

It's rude/frustrating for a package to change existing behavior of a class not introduced by the package itself. We'll probably need to do something like one of the following:

  • Remove vec_proxy_equal.quantile_pred
  • Tweak vec_proxy_equal.quantile_pred to [maybe] not change missingness treatment and not raise an error.
    • If it has only fixes / perf improvements relative to hardhat, then we should contribute upstream to hardhat.
    • If it has other types of changes, we should define a subclass and not change quantile_pred behavior.

Similarly, for other quantile_pred method implementations that we define, we should consider contributing methods upstream if they change existing behavior, or else use a subclass and implement our behavior-change methods there. It might also be good to contribute behavior-adding-but-not-changing methods upstream.

suppressPackageStartupMessages({
  library(dplyr)
  library(hardhat)
  library(vctrs)
})

preds1 <- quantile_pred(matrix(c(1:3, NA, NA, NA), 2, 3, byrow = TRUE), 1:3/4)
preds2 <- vec_c(preds1, NA)

vec_detect_missing(preds1)
#> [1] FALSE FALSE
preds1 == preds1
#> [1] TRUE TRUE

vec_detect_missing(preds2)
#> [1] FALSE FALSE  TRUE
preds2 == preds2
#> [1] TRUE TRUE   NA
vec_equal(preds2, preds2, na_equal = TRUE)
#> [1] TRUE TRUE TRUE

suppressPackageStartupMessages({
  library(epipredict)
})

vec_detect_missing(preds1)
#> [1] FALSE  TRUE
preds1 == preds1
#> [1] TRUE   NA

vec_detect_missing(preds2)
#> Error in `vec_slice()`:
#> ! Column `.quantile_levels` (size 9) must match the data frame (size 6).
#> ℹ In file 'slice.c' at line 188.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
preds2 == preds2
#> Error in `vec_slice()`:
#> ! Column `.quantile_levels` (size 9) must match the data frame (size 6).
#> ℹ In file 'slice.c' at line 188.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
vec_equal(preds2, preds2, na_equal = TRUE)
#> Error in `vec_slice()`:
#> ! Column `.quantile_levels` (size 9) must match the data frame (size 6).
#> ℹ In file 'slice.c' at line 188.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.

Created on 2025-03-19 with reprex v2.1.1

@brookslogan
Copy link
Contributor Author

@dajmcdon Would you be able to assign priority & size estimates here? I'm not sure if you are still trying to do archives of forecasts, and don't know why the vec_proxy_equal impl exists to begin with. I'm guessing maybe medium priority as we may run into this at some point trying to do some nowcasting stuff at least.

@dajmcdon
Copy link
Contributor

I'm not entirely sure I see what you would like the behaviour to be:

suppressPackageStartupMessages({
  library(dplyr)
  library(hardhat)
  library(vctrs)
})

preds1 <- quantile_pred(matrix(c(1:3, NA, NA, NA), 2, 3, byrow = TRUE), 1:3/4)
preds2 <- vec_c(preds1, NA)

This shouldn't work. The vec_c() call doesn't make sense and should error. Agreed? In which case, we should open an issue in {hardhat}, possibly with accompanying PR.

vec_detect_missing(preds1)
#> [1] FALSE FALSE
preds1 == preds1
#> [1] TRUE TRUE

This is the purpose of the implementation of vec_proxy_equal.quantile_pred(). As you show below, it's fixed by {epipredict}. As in the code comments, these should be added to {hardhat} via Issue + PR (though won't hit CRAN for a while after implementation).

vec_detect_missing(preds2)
#> [1] FALSE FALSE  TRUE
preds2 == preds2
#> [1] TRUE TRUE   NA
vec_equal(preds2, preds2, na_equal = TRUE)
#> [1] TRUE TRUE TRUE

This object shouldn't exist, so I'm not worried about strange behaviour.

suppressPackageStartupMessages({
  library(epipredict)
})

vec_detect_missing(preds1)
#> [1] FALSE  TRUE
preds1 == preds1
#> [1] TRUE   NA

Seems to be the correct behaviour, as

NA == NA
#> [1] NA

None of this should be possible. Requires the fix above.

vec_detect_missing(preds2)
#> Error in `vec_slice()`:
#> ! Column `.quantile_levels` (size 9) must match the data frame (size 6).
#> ℹ In file 'slice.c' at line 188.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
preds2 == preds2
#> Error in `vec_slice()`:
#> ! Column `.quantile_levels` (size 9) must match the data frame (size 6).
#> ℹ In file 'slice.c' at line 188.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
vec_equal(preds2, preds2, na_equal = TRUE)
#> Error in `vec_slice()`:
#> ! Column `.quantile_levels` (size 9) must match the data frame (size 6).
#> ℹ In file 'slice.c' at line 188.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.

Does all of that seem correct to you? In that case, the problem seems to be that vec_c() produces preds2 when it should error. Is that correct?

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 20, 2025

vec_c() casts everything to a common ptype before combining; it has special treatment of logical NA to allow you to introduce missing values for any vctrs vector; see, e.g., here. The same missingness sentinel is used throughout vctrs and by downstream tidyverse operations like full_join, dplyr::lag, etc. I have another sample use case here tidymodels/hardhat#281 which mostly works with just hardhat, but printing errors; the epipredict vec_proxy_equal make things a bit more broken.

It's maybe up for debate what vec_c(preds1, NA) should add, but it should do something sensible. This is probably controlled by vec_cast.

preds1 == preds1 result seems right[? I guess that's somewhat debatable], I just wanted to show that it didn't have a problem executing while preds1p == preds1p does.

I'm not sure whether a row of all NAs should be treated as "missing"; {distributional} seems not to work that way either; see linked hardhat issue.

@dajmcdon
Copy link
Contributor

Agreed. Probably requires implementing the vec_cast methods along the lines of the non working code at

# vec_cast.quantile_pred.quantile_pred <- function(

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 20, 2025

Alternatively, and I think this was the original intention (matrix input + "special vector class used to efficiently store predictions from a quantile regression model"), quantile_preds could be backed by [or proxied into, in a way that doesn't raise errors] a matrix or made into a rcrd [or maybe a tibble], rather than a list. Much larger change.

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 20, 2025

Side note: the current vec_proxy_equal implementation does seem to also make vec_detect_complete output reasonable; I haven't yet filed that as a bug in hardhat but mentioned as a side note. I didn't realize the above NA==NA sort of comparison at the time of filing, so I haven't reported that either. (Though why is is vec_proxy_equal and not just the general vec_proxy? ... well I guess things might go wrong if it's used as vec_proxy_compare and if that leads to some sort of lexicographical thing... not sure)

vctrs::vec_detect_complete(hardhat::quantile_pred(matrix(c(1,2,3, 1,NA,NA, NA,NA,NA),3,3,byrow = TRUE), 1:3/4))
#> [1] TRUE TRUE TRUE
library(epipredict)
# [...]
vctrs::vec_detect_complete(hardhat::quantile_pred(matrix(c(1,2,3, 1,NA,NA, NA,NA,NA),3,3,byrow = TRUE), 1:3/4))
#> [1]  TRUE FALSE FALSE

Created on 2025-03-20 with reprex v2.1.1

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 21, 2025

vec_cast might be a bust. I think the native code specially detects the NA case and reroutes to the native vec_init, which might determine the behavior based on the proxy or [ on a rep'd NA_integer_. It seems possible to fix that up with vec_proxy and vec_restore impls, but then vec_chop misbehaves, causing as.list to misbehave, causing other things to misbehave, and it seems to be a long journey to trace down the vec_chop failure.

So I think that leaves approaches touching on more central operations such as:

  • A. Embrace NULL as a missingness sentinel. Don't expect unclassed/default-proxy object to be a list of things with length matching the length of the quantile levels; acknowledge there could be NULLs mixed in.
  • B. Move to matrix, rcrd, or tibble backing. (inherit_base_type = FALSE is not supported for list backing, so it doesn't seem like there's a good way to avoid the list behaviors while backed by a list.)
  • C. Don't use new_vctr, at least not directly on the list, use (no-"list"-in-class, I'm assuming) S3 obj + proxy&restore. This doesn't make c forward to vec_c etc. though, so it's not ideal.
  • D. Hold the data in a list with class "notlist" (or some obscuring wrapper around the list) + use new_vctr w/ proxy&restore like in the preceding approach.

Here's the core of something like D., though I'd imagine A. or B. would be easier/cleaner...

notlist <- function(x) {
  class(x) <- "notlist"
  x
}

new_quantile_pred2 <- function(quantiles, levels) {
  # Force everything to numeric or else we need to track elt typeof in an attr
  # and invite more ptype2 and vec_cast work, as vctrs ops on proxy objects
  # don't like integer() type mixing with others and at some point we have to
  # make an object with the appropriate type given an empty list of quantiles:
  quantiles <- vec_cast_common(!!!quantiles, .to = double())
  quantiles <- notlist(quantiles)
  new_vctr(quantiles, quantile_levels = levels, class = "quantile_pred2")
}

#' @importFrom vctrs vec_proxy
#' @export
vec_proxy.quantile_pred2 <- function(x, ...) {
  x_unclassed <- unclass(x)
  mat <-
    if (length(x_unclassed) == 0) {
      matrix(numeric(), 0, length(attr(x, "quantile_levels")))
    } else {
      x_unclassed %>%
        {vctrs::vec_interleave(!!!.)} %>%
        matrix(length(x_unclassed), length(attr(x, "quantile_levels")))
    }

  # We need to wrap in a tibble or else vec_init won't respect the matrix
  # ncol:
  tibble(m = mat)
}

#' @importFrom vctrs vec_restore
#' @export
vec_restore.quantile_pred2 <- function(x, to, ...) {
  x %>%
    .$m %>%
    vec_chop() %>%
    lapply(function(x) {dim(x) <- NULL; x}) %>%
    new_quantile_pred2(attr(to, "quantile_levels"))
}

@brookslogan brookslogan self-assigned this Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants