Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate_wider_* functions remove row names from data frame #1500

Open
MarcoBruttini opened this issue May 24, 2023 · 2 comments
Open

separate_wider_* functions remove row names from data frame #1500

MarcoBruttini opened this issue May 24, 2023 · 2 comments
Labels
bug an unexpected problem or unintended behavior df-col 👜

Comments

@MarcoBruttini
Copy link

MarcoBruttini commented May 24, 2023

If you use separate_wider_* on a data frame with row names, the resulting data frame won't have row names anymore. Since documentation states that rows are not affected by this function, I suppose that this is an unwanted behavior.

library(tidyverse)

df <- data.frame(
  row.names = letters[1:3],
  col_to_separate = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)

df
#>   col_to_separate
#> a             A-A
#> b             B-B
#> c             C-C

df %>%
  separate_wider_delim(
    col_to_separate,
    delim = "-",
    names = paste0("C", 1:2)
  )
#> # A tibble: 3 × 2
#>   C1    C2   
#>   <chr> <chr>
#> 1 A     A    
#> 2 B     B    
#> 3 C     C

df %>%
  separate_wider_position(
    col_to_separate,
    widths = c(C1 = 1, 1, C2 = 1)
  )
#> # A tibble: 3 × 2
#>   C1    C2   
#>   <chr> <chr>
#> 1 A     A    
#> 2 B     B    
#> 3 C     C

df %>%
  separate_wider_regex(
    col_to_separate,
    patterns = c(C1 = ".", ".", C2 = ".")
  )
#> # A tibble: 3 × 2
#>   C1    C2   
#>   <chr> <chr>
#> 1 A     A    
#> 2 B     B    
#> 3 C     C

Created on 2023-05-24 with reprex v2.0.2

@hadley hadley added the bug an unexpected problem or unintended behavior label Nov 1, 2023
@hadley
Copy link
Member

hadley commented Nov 1, 2023

Somewhat more minimal reprex:

library(tidyverse)

df <- data.frame(
  row.names = letters[1:3],
  x = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)
rownames(df)
#> [1] "a" "b" "c"

df |> 
  separate_wider_delim(x, delim = "-", names = c("a", "b")) |> 
  rownames()
#> [1] "1" "2" "3"

Created on 2023-11-01 with reprex v2.0.2

Looks like the root cause of this is unpack()

@DavisVaughan
Copy link
Member

Seems like pack() and unpack() need a little bit of special handling of rownames with base data.frames. I see 3 distinct problems

  • pack() should strip rownames of the inner packed data frames, and only retain them on outer frame
  • unpack() should probably return a data frame if input was a data frame (rare, mostly you unpack a tibble)
  • unpack() should keep row names, particularly important if we output a base data frame due to above bullet
library(tidyverse)

df <- data.frame(
  row.names = letters[1:3],
  x = paste(LETTERS[1:3], LETTERS[1:3], sep = "-")
)
rownames(df)
#> [1] "a" "b" "c"

df <- tidyr::pack(df, foo = x)

# Row names on outside
rownames(df)
#> [1] "a" "b" "c"

# Row names on inside too
# (these should probably get removed)
rownames(df$foo)
#> [1] "a" "b" "c"

# - Should this return a data.frame?
# - Should this keep row names of df?
# (Probably yes to both)
tidyr::unpack(df, foo)
#> # A tibble: 3 × 1
#>   x    
#>   <chr>
#> 1 A-A  
#> 2 B-B  
#> 3 C-C

Created on 2024-07-27 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior df-col 👜
Projects
None yet
Development

No branches or pull requests

3 participants