Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowwise() with a mutate() call with warnings is very slow #6236

Closed
charliejhadley opened this issue Apr 20, 2022 · 11 comments · Fixed by #6443
Closed

rowwise() with a mutate() call with warnings is very slow #6236

charliejhadley opened this issue Apr 20, 2022 · 11 comments · Fixed by #6443

Comments

@charliejhadley
Copy link
Contributor

charliejhadley commented Apr 20, 2022

As identified by @debruine here #6236 (comment) this issue is actually caused when mutate() is used after rowwise() and generates warnings.

This code takes ~11 seconds to run in a clean R session

library(dplyr)
library(tictoc)
tic()
foo <- tibble(
  id = 1:800
) %>% 
  rowwise() %>% 
  mutate(group_b_mean = mean(NULL))
toc()
# 11.584 sec elapsed

Please note that weirdly the same code takes <1second when run with the {reprex} addin.

Many thanks to debruine for figuring this out.

@charliejhadley charliejhadley changed the title Should c_across(contains("string")) fail faster when "string" doesn't exist. c_across(character(0)) takes a very long time to fail and scales with nrow() Apr 20, 2022
@debruine
Copy link

It has to do with the warnings. You get the same problem without any use of c_across:

foo <- tibble(
  id = 1:800
) %>% 
  rowwise() %>% 
  mutate(group_b_mean = mean(NULL))

@charliejhadley charliejhadley changed the title c_across(character(0)) takes a very long time to fail and scales with nrow() rowwise() with a mutate() call with errors is very slow Apr 20, 2022
@sjkiss
Copy link

sjkiss commented Jul 15, 2022

I'm having this issue; it's fairly debilitating some code that use to run without issues. Is there any status update on this?

@eutwt
Copy link
Contributor

eutwt commented Jul 15, 2022

Looks like it got slower after this pr. Example above goes from 1.6s to 15s for me and profvis shows all the time is spent in cli_format()

Goes to 0.05s if you change mean(NULL) to suppressWarnings(mean(NULL)) (obviously not a good long-term solution)

@hadley
Copy link
Member

hadley commented Jul 21, 2022

Minimal reprex:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(id = 1:100) 

f <- function() {
  warning()
  1
}

bench::system_time(
  df %>% 
    rowwise() %>% 
    mutate(x = f())
)
#> process    real 
#>     3s    3.6s 

(Note that you can't run this with the reprex package because something about the way rmarkdown handles warnings makes the problem go away)

@lionel- could you take a look to see if there's any obvious way to speed up the warning wrapping in mutate_cols()?

@hadley hadley changed the title rowwise() with a mutate() call with errors is very slow rowwise() with a mutate() call with warnings is very slow Jul 22, 2022
@hadley hadley added this to the 1.1.0 milestone Aug 18, 2022
@lionel-
Copy link
Member

lionel- commented Aug 22, 2022

One possible way to go about this would be to collect warnings until all computations are done, and then emit the first few warnings and suggest to use something like dplyr_last_warnings() to get them all.

Edit: Would also help with #6005.

@hadley
Copy link
Member

hadley commented Aug 22, 2022

@lionel- I like that idea.

@hadley
Copy link
Member

hadley commented Sep 21, 2022

Wooo, much faster now!

library(dplyr, warn.conflicts = FALSE)

df <- tibble(id = 1:100) 

f <- function() {
  warning()
  1
}

bench::system_time(
  df %>% 
    rowwise() %>% 
    mutate(x = f())
)
#> Warning: There were 100 warnings in a `mutate()` step.
#> The first warning was:
#> ! Problem in row 1 while computing `x = f()`.
#> Caused by warning in `f()`:
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 99 remaining warnings.
#> process    real 
#>  66.8ms  65.8ms

Created on 2022-09-21 with reprex v2.0.2

@sjkiss
Copy link

sjkiss commented Nov 25, 2022

Sorry: has this been added to 1.0.10 yet? I'm not clear. How do I get this fix into my version of dplyr?

@hadley
Copy link
Member

hadley commented Nov 28, 2022

@sjkiss it's in the dev version

@vak
Copy link

vak commented Jan 29, 2024

dplyr 1.1.2 is still slow on it

@hadley
Copy link
Member

hadley commented Jan 29, 2024

@vak please file a new issue with reprex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants