Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile code and optimize bottlenecks #16

Open
corybrunson opened this issue Apr 3, 2018 · 4 comments
Open

profile code and optimize bottlenecks #16

corybrunson opened this issue Apr 3, 2018 · 4 comments

Comments

@corybrunson
Copy link
Owner

Description of the issue

Diagrams for large datasets take a long time to render. The bottlenecks might be due to inefficiencies in the code. Profile the code, identify the bottlenecks, and benchmark alternative implementations. (See this chapter in Advanced R.)

Reproducible example (preferably using reprex::reprex())

(Need a suitable public dataset.)

@cenuno
Copy link

cenuno commented Apr 18, 2018

@corybrunson This package is awesome. Thank you for taking the time to build it! I would love to help out.

Could you tell me which scripts in your /ggalluvial/R folder are relevant when running the following lines of code?

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           weight = freq,
           fill = response, label = response)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

In the meantime, I'm hoping to create a data set that contains 5 million rows and 3 columns to use in the reprex.

@corybrunson
Copy link
Owner Author

@cenuno thank you for saying so! I'd be very glad for the large-scale example. The code chunk you shared relies on functions defined in the files stat-flow.r, geom-flow.r, stat-stratum.r, and geom-stratum.r, and possibly indirectly some code in stat-utils.r, geom-utils.r, and lode-guidance-functions.r. (In general, a layer—usually stat_*() or geom_*()—invokes one stat and one geom, and the stats and geoms are roughly paired up in this package.)

@cenuno
Copy link

cenuno commented Apr 20, 2018

Sweet. I'll start investigating using the vaccinations data set just to get a sense of the workflow. It will probably take awhile but I want - as I'm sure others do as well - this to work with larger data sets.

@universal
Copy link

universal commented Dec 2, 2021

library(tidyverse)
library(ggalluvial)

i <- 100
waves <- 10
alluvial_test <- as_tibble(data.frame(id = as.numeric(rep(1:i, each = waves)), 
                             wave = factor(rep(1:waves, i)), 
                             status = factor(sample(rep(c("A", "B", "C", "D"), each = i*waves/4)), levels = c("A", "B", "C", "D"), labels = c("A", "B", "C", "D")))) 


p <- ggplot(data = alluvial_test, aes( x = wave, stratum = status, alluvium = id, fill = status, label = status)) 
p + geom_flow(stat = "alluvium", lode.guidance = "frontback", color = "darkgray") + geom_stratum()

Created on 2021-12-02 by the reprex package (v2.0.1)

increasing i and waves will quickly result in a very slow plot ;-) anyways, for myself grouping by status and just have the transitions between the groups and not the individual ones would be enough... Currently thinking about how to regroup the data. :-) but am currently drawing a blank...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants