You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi -- I've encountered an issue where dtplyr seems to fail when filtering data that has a lubridate::interval() column. I saw this originally on a tibble of ~50 columns, of various different data types (including several lubridate date/time etc types), and dropping the single interval() column seemed to fix it -- so it does seem to be specific to interval data.
I've submitted here (rather than as a lubridate issue) as it happens when the filtering is done with respect to other data (here an integer column).
It's easy enough to work around, but figured I'd raise an issue as the behaviour seems unexpected. Any thoughts appreciated! 😃
library(dplyr)
library(dtplyr)
library(lubridate)
# dummy datadf<- tibble(a=1:3) |>
mutate(interval= interval(start= ymd("2024-01-01") - days(a), end= ymd("2024-01-01")))
# expected filter result using dplyrdf|>
filter(a== max(a))
# dtplyr filter result throws errordf|>dtplyr::lazy_dt() |>
filter(a== max(a))
# dtplyr filter result (also throws error -- so nothing to do with max())df|>dtplyr::lazy_dt() |>
filter(a==3)
# Error in `[<-`:# ! Assigned data `map(.subset(x, unname), vectbl_set_names, NULL)` must be compatible with existing# data.# ✖ Existing data has 1 row.# ✖ Element 2 of assigned data has 3 rows.# ℹ Row updates require a list value. Do you need `list()` or `as.list()`?# Caused by error in `vectbl_recycle_rhs_rows()`:# ! Can't recycle input of size 3 to size 1.# dtplyr filter works when dropping lubridate::interval coldf|>
select(-interval) |>dtplyr::lazy_dt() |>
filter(a== max(a))
Period objects and similar "multi-column" structures are not supported by data.table, as described in Rdatatable/data.table#4415. I don't think there's anything we can do on the dtplyr end.
Notice the length of the "start" slot when subsetting a data frame vs when subsetting a data.table. Subsetting the data.table (rather than just a column) produces an error.
suppressPackageStartupMessages({
library(lubridate)
library(data.table)
library(dplyr)
})
df<- tibble(a=1:3) |>
mutate(interval= interval(start= ymd("2024-01-01") - days(a), end= ymd("2024-01-01")))
dt<- as.data.table(df)
str(df[3, 'interval', drop=TRUE])
#> Formal class 'Interval' [package "lubridate"] with 3 slots#> ..@ .Data: num 259200#> ..@ start: POSIXct[1:1], format: "2023-12-29"#> ..@ tzone: chr "UTC"
str(dt[3, interval])
#> Formal class 'Interval' [package "lubridate"] with 3 slots#> ..@ .Data: num 259200#> ..@ start: POSIXct[1:3], format: "2023-12-31" "2023-12-30" ...#> ..@ tzone: chr "UTC"dt[3]
#> Error in dimnames(x) <- dn: length of 'dimnames' [1] not equal to array extent
Hi -- I've encountered an issue where
dtplyr
seems to fail when filtering data that has alubridate::interval()
column. I saw this originally on a tibble of ~50 columns, of various different data types (including several lubridate date/time etc types), and dropping the singleinterval()
column seemed to fix it -- so it does seem to be specific to interval data.I've submitted here (rather than as a lubridate issue) as it happens when the filtering is done with respect to other data (here an integer column).
It's easy enough to work around, but figured I'd raise an issue as the behaviour seems unexpected. Any thoughts appreciated! 😃
sessionInfo()
The text was updated successfully, but these errors were encountered: