-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unnest()
#266
Comments
I don't know the full solution, but here are a few more thoughts/notes on this. The Note: If used we should bump the data.table dependency to v1.13.2 due to performance issues of this syntax in 1.13.0. Relevant issue library(data.table)
nest_df <- data.table(y = 1:2)
nest_list <- list(nest_df, nest_df)
test_df <- data.table(
x = c("a", "b"),
df_list = nest_list
)
test_df[, df_list[[1]], by = x]
#> x y
#> 1: a 1
#> 2: a 2
#> 3: b 1
#> 4: b 2 We could build out unnesting multiple list columns by creating a call to library(data.table)
nest_df <- data.table(y = 1:2)
nest_list <- list(nest_df, nest_df)
nest_list2 <- lapply(nest_list, setNames, "z")
test_df <- data.table(
x = c("a", "b"),
df_list = nest_list,
df_list2 = nest_list2
)
test_df[, c(df_list[[1]], df_list2[[1]]), by = x]
#> x y z
#> 1: a 1 1
#> 2: a 2 2
#> 3: b 1 1
#> 4: b 2 2 You can unnest vectors this way, however they come out auto-named unless you slightly change the syntax. This auto-naming also occurs using the library(data.table)
test_df <- data.table(
x = c("a", "b"),
vec_list = list(1:2, 1:2)
)
# Auto named
test_df[, vec_list[[1]], by = x]
#> x V1
#> 1: a 1
#> 2: a 2
#> 3: b 1
#> 4: b 2
# Assigning a name
test_df[, .(vec_list = vec_list[[1]]), by = x]
#> x vec_list
#> 1: a 1
#> 2: a 2
#> 3: b 1
#> 4: b 2 In tidytable I handled this with a simple if statement, but in dtplyr we won't know the type of the data until evaluation occurs. A simple rename of the library(data.table)
test_df <- data.table(
V1 = c("a", "b"),
vec_list = list(1:2, 1:2)
)
test_df[, vec_list[[1]], by = V1]
#> V1 V1
#> 1: a 1
#> 2: a 2
#> 3: b 1
#> 4: b 2 And last one I can think of at the moment - unnesting lists of data frames and lists of vectors at the same time causes some issues where data.table tries to recycle the vectors and creates unnamed columns (this one might be worth opening a separate a data.table issue): library(data.table)
nest_df <- data.table(y = 1:2)
nest_list <- list(nest_df, nest_df)
test_df <- data.table(
x = c("a", "b"),
df_list = nest_list,
vec_list = list(1:2, 1:2)
)
test_df[, c(df_list[[1]], vec_list[[1]]), by = x]
#> x y
#> 1: a 1 1 2
#> 2: a 2 1 2
#> 3: b 1 1 2
#> 4: b 2 1 2 |
@markfairbanks thanks for your notes and thoughts on this! Do you know any nice way to handle library(data.table)
nest_df <- data.table(y = 1:2)
test_df <- data.table(
x = c("a", "b"),
df_list = list(nest_df, NULL),
df_list2 = list(nest_df, nest_df)
)
test_df[, c(df_list[[1]], df_list2[[1]]), by = x]
#> Error in `[.data.table`(test_df, , c(df_list[[1]], df_list2[[1]]), by = x): j doesn't evaluate to the same number of columns for each group
tidyr::unnest(test_df, c(df_list, df_list2), names_repair = "unique")
#> New names:
#> * y -> y...2
#> * y -> y...3
#> # A tibble: 4 x 3
#> x y...2 y...3
#> <chr> <int> <int>
#> 1 a 1 1
#> 2 a 2 2
#> 3 b NA 1
#> 4 b NA 2 Created on 2021-07-05 by the reprex package (v2.0.0) |
I don't unfortunately. |
I tried to add
unnest()
but it seems to be quite difficult. This issues acts more like a reminder for issues I encountered. Maybe at some point data.table gets its own implementation:The standard solution is something like
but this has the following problems:
j
must not beNULL
when usingby
-> user has to specify.ptype
.The text was updated successfully, but these errors were encountered: