Add `unnest()` #266

mgirlich · 2021-07-02T07:09:29Z

I tried to add unnest() but it seems to be quite difficult. This issues acts more like a reminder for issues I encountered. Maybe at some point data.table gets its own implementation:

The standard solution is something like

dt[, lapply(.SD, unlist, recursive = FALSE), by = ...]

but this has the following problems:

first result in j must not be NULL when using by -> user has to specify .ptype.
data.table doesn't support dataframe columns.
to unnest list of data frames in data.table syntax seems to be quite tricky.

The text was updated successfully, but these errors were encountered:

markfairbanks · 2021-07-02T16:24:38Z

I don't know the full solution, but here are a few more thoughts/notes on this.

The unlist() syntax has been superseded by using list_col[[1]].

Note: If used we should bump the data.table dependency to v1.13.2 due to performance issues of this syntax in 1.13.0. Relevant issue

library(data.table)

nest_df <- data.table(y = 1:2)
nest_list <- list(nest_df, nest_df)

test_df <- data.table(
  x = c("a", "b"),
  df_list = nest_list
)

test_df[, df_list[[1]], by = x]
#>    x y
#> 1: a 1
#> 2: a 2
#> 3: b 1
#> 4: b 2

We could build out unnesting multiple list columns by creating a call to c() like this:

library(data.table)

nest_df <- data.table(y = 1:2)
nest_list <- list(nest_df, nest_df)
nest_list2 <- lapply(nest_list, setNames, "z")

test_df <- data.table(
  x = c("a", "b"),
  df_list = nest_list,
  df_list2 = nest_list2
)

test_df[, c(df_list[[1]], df_list2[[1]]), by = x]
#>    x y z
#> 1: a 1 1
#> 2: a 2 2
#> 3: b 1 1
#> 4: b 2 2

You can unnest vectors this way, however they come out auto-named unless you slightly change the syntax. This auto-naming also occurs using the unlist() syntax.

library(data.table)

test_df <- data.table(
  x = c("a", "b"),
  vec_list = list(1:2, 1:2)
)

# Auto named
test_df[, vec_list[[1]], by = x]
#>    x V1
#> 1: a  1
#> 2: a  2
#> 3: b  1
#> 4: b  2

# Assigning a name
test_df[, .(vec_list = vec_list[[1]]), by = x]
#>    x vec_list
#> 1: a        1
#> 2: a        2
#> 3: b        1
#> 4: b        2

In tidytable I handled this with a simple if statement, but in dtplyr we won't know the type of the data until evaluation occurs. A simple rename of the V1 column is possible, but issues can occur if they already have a column named V1:

library(data.table)

test_df <- data.table(
  V1 = c("a", "b"),
  vec_list = list(1:2, 1:2)
)

test_df[, vec_list[[1]], by = V1]
#>    V1 V1
#> 1:  a  1
#> 2:  a  2
#> 3:  b  1
#> 4:  b  2

And last one I can think of at the moment - unnesting lists of data frames and lists of vectors at the same time causes some issues where data.table tries to recycle the vectors and creates unnamed columns (this one might be worth opening a separate a data.table issue):

library(data.table)

nest_df <- data.table(y = 1:2)
nest_list <- list(nest_df, nest_df)

test_df <- data.table(
  x = c("a", "b"),
  df_list = nest_list,
  vec_list = list(1:2, 1:2)
)

test_df[, c(df_list[[1]], vec_list[[1]]), by = x]
#>    x y    
#> 1: a 1 1 2
#> 2: a 2 1 2
#> 3: b 1 1 2
#> 4: b 2 1 2

mgirlich · 2021-07-05T06:46:09Z

@markfairbanks thanks for your notes and thoughts on this! Do you know any nice way to handle NULL?

library(data.table)

nest_df <- data.table(y = 1:2)
test_df <- data.table(
  x = c("a", "b"),
  df_list = list(nest_df, NULL),
  df_list2 = list(nest_df, nest_df)
)

test_df[, c(df_list[[1]], df_list2[[1]]), by = x]
#> Error in `[.data.table`(test_df, , c(df_list[[1]], df_list2[[1]]), by = x): j doesn't evaluate to the same number of columns for each group

tidyr::unnest(test_df, c(df_list, df_list2), names_repair = "unique")
#> New names:
#> * y -> y...2
#> * y -> y...3
#> # A tibble: 4 x 3
#>   x     y...2 y...3
#>   <chr> <int> <int>
#> 1 a         1     1
#> 2 a         2     2
#> 3 b        NA     1
#> 4 b        NA     2

^{Created on 2021-07-05 by the reprex package (v2.0.0)}

markfairbanks · 2021-07-05T20:09:33Z

I don't unfortunately. NULL list values might have to be a limit of dtplyr for now.

jtlandis mentioned this issue Sep 23, 2021

Implement unite() #301

Closed

markfairbanks added the feature a feature request or enhancement label Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `unnest()` #266

Add `unnest()` #266

mgirlich commented Jul 2, 2021 •

edited

Loading

markfairbanks commented Jul 2, 2021

mgirlich commented Jul 5, 2021

markfairbanks commented Jul 5, 2021

Add unnest() #266

Add unnest() #266

Comments

mgirlich commented Jul 2, 2021 • edited Loading

markfairbanks commented Jul 2, 2021

mgirlich commented Jul 5, 2021

markfairbanks commented Jul 5, 2021

Add `unnest()` #266

Add `unnest()` #266

mgirlich commented Jul 2, 2021 •

edited

Loading