Error when summarise refers to previously created variable #75

hadley · 2019-06-26T17:30:56Z

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

lz <- lazy_dt(data.frame(x = 1:10))
lz %>% summarise(x = mean(x), y = x + 1) %>% collect()
#>       x  y
#>  1: 5.5  2
#>  2: 5.5  3
#>  3: 5.5  4
#>  4: 5.5  5
#>  5: 5.5  6
#>  6: 5.5  7
#>  7: 5.5  8
#>  8: 5.5  9
#>  9: 5.5 10
#> 10: 5.5 11

^{Created on 2019-06-26 by the reprex package (v0.2.1.9000)}

Use same technique from dbplyr.

Fxies #75

dyrland · 2019-08-22T18:23:27Z

I know this is closed, but I use summarise() all the time in the wild.

I often need to express variables in different ways, as different bosses like data presented differently. This leads to me doing multiple summaries for the same variables. For instance, below I need the number of Auto Lunches per Manager and then the percent that are Auto Lunches. I prefer the first, df.out, (it's the cleanest, IMHO) but the second, dt.out, is ok. The third is also fine for this MWE, but as the calculations become more complex repeating the code becomes unintelligible.

Of course, perhaps I could have data prep skills. :D

df.out <- df %>%
  group_by(Manager) %>%
  summarize(`Auto Lunches` = sum(auto.lunch == TRUE),
            `Gross Hours` = sum(auto.lunch.time) / -60,
            `Auto Lunch %` = `Auto Lunches` / n() * 100,
            `Auto Lunch % (Mods)` = `Auto Lunches` / sum(modified == TRUE) * 100
  )
   
dt <- lazy_dt(df)
dt.out1 <- dt %>%
  group_by(Manager) %>%
  summarize(`Auto Lunches` = sum(auto.lunch == TRUE),
            `Gross Hours` = sum(auto.lunch.time) / -60,
            n = n()) %>% 
  mutate(`Auto Lunch %` = `Auto Lunches` / n * 100,
         `Auto Lunch % (Mods)` = `Auto Lunches` / sum(modified == TRUE) * 100
  )

dt.out2 <- dt %>%
  group_by(Manager) %>%
  summarize(`Auto Lunches` = sum(auto.lunch == TRUE),
            `Gross Hours` = sum(auto.lunch.time) / -60,
            `Auto Lunch %` = sum(auto.lunch == TRUE) / n() * 100,
            `Auto Lunch % (Mods)` = sum(auto.lunch == TRUE) / 
              sum(modified == TRUE) * 100
  )

dyrland · 2019-08-22T18:47:10Z

Yikes! dt.out1 won't work because I have to create a modified variable. I guess another reason why I am hopeful that I can use previously created variables in summarise()...

gpierard · 2022-09-10T01:08:00Z

I had this error because my df was a data.table. Got it fixed after converting to tibble

TheDohn · 2024-06-20T20:30:03Z

Could anyone elaborate on what:

Use same technique from dbplyr.

means? I can see how to avoid the error all together by calculating each summarize variable independently as it's own table, then joining the results together, but I am wondering if there is a better way using dtplyr.

hadley mentioned this issue Jun 26, 2019

translation vignette feedback #73

Closed

hadley added a commit that referenced this issue Jul 3, 2019

Clean error if summarising summarised variable

f2ae6a5

Fxies #75

hadley closed this as completed Jul 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when summarise refers to previously created variable #75

Error when summarise refers to previously created variable #75

hadley commented Jun 26, 2019

dyrland commented Aug 22, 2019 •

edited

Loading

dyrland commented Aug 22, 2019

gpierard commented Sep 10, 2022

TheDohn commented Jun 20, 2024

Error when summarise refers to previously created variable #75

Error when summarise refers to previously created variable #75

Comments

hadley commented Jun 26, 2019

dyrland commented Aug 22, 2019 • edited Loading

dyrland commented Aug 22, 2019

gpierard commented Sep 10, 2022

TheDohn commented Jun 20, 2024

dyrland commented Aug 22, 2019 •

edited

Loading