-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why is using flox slower than not using flox (on a laptop) // one example from flox docs #363
Comments
Nice example. I believe this is #222 . I ran the reduction and took a mental note of the timings for the "blocks" tasks while specifying the
So installing I'll note that flox's real innovation is making more things possible (e.g. this post) that would just straight up fail otherwise. The default Xarray strategy does work well for a few chunking schemes (indeed this observation inspired "cohorts"), but it's hard to predict if you haven't deeply thought about groupby. EDIT: I love that "cohorts" (the automatic choice) is 2x faster than "map-reduce". |
@dcherian - thanks for these comments ( and for all the helpful tools! )
I do really appreciate this important point - even if I possibly currently lack the understanding to write a simple example that shows this for climatological aggregations over one dimension. I did try to push the size of the array farther to reach a point where "not-flox" failed and "flox" completed. But in this simple case I couldn't seem to do that with the array size changes I was making? Given my real world problem is trying to apply climatological aggregations to 11TB arrays "making more things possible" is the gold star for a cluster of given size and why re: I'll try to apply some of your comments here . . . |
Something else I'm clearly not understanding - I thought that current
in this case how does adding |
OK - yes . . .
|
Yes unclear syntax. See https://flox.readthedocs.io/en/latest/engines.html. Basically there's two levels to flox
By setting
Installing Setting |
Nice. one of the "challenges" is that dask tends to improve with time, so this envelope keeps shifting (and sometimes regresses hehe). |
I'll note that my major goal is here is to get decent perf with 0 thinking :) Hence my excitement that we are automatically choosing Clearly, I need to think more about how to set |
You might try a daily climatology or an hourly climatology to see how things shape up |
Hi @Thomas-Moore-Creative is there anything to follow up here? |
Thanks to @dcherian and others who have been obsessively trying to make very common tasks like calculating climatological aggregations on climate data faster and easier.
I have some use cases on HPC where we are running climatological aggregations (and climatological aggregations composited over ENSO phases) on very large 4D dask arrays (11TB) and I have been digging into how best to employ
flox
.BUT today our HPC centre is down for regular maintenance so I tried to run some dummy examples on my laptop ( Apple M2 silicon, 32GB RAM ) using a 4 worker
LocalCluster
and this example from the documentation - How about other climatologies?The one change I made was to replace
ones
withrandom
- as this seemed a more realistic test. I have no evidence but wonder ifones
would be something "easier" forxarray
?The dummy array ended up being:
To my surprise not using
flox
was always much faster? I forcedflox
to try bothmap-reduce
andcohorts
.RESULTS: (which were repeatable and run after clearing memory and restarting the cluster)
code is here via nbviewer
My goal was to generate an easy to run notebook where I could demonstrate to my colleagues the power of
flox
. Instead, I'm a bit less confident I understand how this works.Questions:
Thanks!
The text was updated successfully, but these errors were encountered: