You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am curious whether you have any benchmarking or estimate demonstrating the time and free mem required to zero-filled the entire EBD versus zero-filling e.g. one state or one province at a time. Documentation does suggest that the entire EBD will take multiple hours to process.
At what number of observations (measured approximately by the number of states/provinces/countries) do we get diminishing returns versus just zero-filling the entire EBD?
Thanks in advance if you have this info
For posterity, here are some time estimates for zero-filling and writing (using vroom::write()) the zero-filled data frame for Double-crested Cormorant (using Sep. 2021 data) in US and CAN:
Country
State/Prov
Process
Seconds elapsed
USA
Illinois
zerofill(collapse=TRUE)
1404
USA
Illinois
vroom::write(ebd_zf)
179
The text was updated successfully, but these errors were encountered:
I've never looked into this and, to be honest, I wrote the zero filling code a very long time ago so I suspect it's quite inefficient. You may actually find it faster to implement it yourself using dplyr or data.table. A lot of the auk functions need some updating, but I'm hesitant to do so because I think the eBird dataset has reached such a massive size that a totally different approach to distributing and manipulating these data is likely needed. In that vein, if you're working with eBird data, you may want to explore the experimental package birddb that I've been working on with Carl Boettiger: https://github.com/cboettig/birddb
I am curious whether you have any benchmarking or estimate demonstrating the time and free mem required to zero-filled the entire EBD versus zero-filling e.g. one state or one province at a time. Documentation does suggest that the entire EBD will take multiple hours to process.
At what number of observations (measured approximately by the number of states/provinces/countries) do we get diminishing returns versus just zero-filling the entire EBD?
Thanks in advance if you have this info
For posterity, here are some time estimates for zero-filling and writing (using vroom::write()) the zero-filled data frame for Double-crested Cormorant (using Sep. 2021 data) in US and CAN:
zerofill(collapse=TRUE)
vroom::write(ebd_zf)
The text was updated successfully, but these errors were encountered: