Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any benchmarking or process time estimates for zerofilling the entire EBD versus multiple states/provinces? #62

Open
trashbirdecology opened this issue Dec 6, 2021 · 2 comments

Comments

@trashbirdecology
Copy link

trashbirdecology commented Dec 6, 2021

I am curious whether you have any benchmarking or estimate demonstrating the time and free mem required to zero-filled the entire EBD versus zero-filling e.g. one state or one province at a time. Documentation does suggest that the entire EBD will take multiple hours to process.

At what number of observations (measured approximately by the number of states/provinces/countries) do we get diminishing returns versus just zero-filling the entire EBD?

Thanks in advance if you have this info

For posterity, here are some time estimates for zero-filling and writing (using vroom::write()) the zero-filled data frame for Double-crested Cormorant (using Sep. 2021 data) in US and CAN:

Country State/Prov Process Seconds elapsed
USA Illinois zerofill(collapse=TRUE) 1404
USA Illinois vroom::write(ebd_zf) 179
@mstrimas
Copy link
Contributor

mstrimas commented Dec 6, 2021

I've never looked into this and, to be honest, I wrote the zero filling code a very long time ago so I suspect it's quite inefficient. You may actually find it faster to implement it yourself using dplyr or data.table. A lot of the auk functions need some updating, but I'm hesitant to do so because I think the eBird dataset has reached such a massive size that a totally different approach to distributing and manipulating these data is likely needed. In that vein, if you're working with eBird data, you may want to explore the experimental package birddb that I've been working on with Carl Boettiger: https://github.com/cboettig/birddb

@trashbirdecology
Copy link
Author

trashbirdecology commented Dec 7, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants