-
Notifications
You must be signed in to change notification settings - Fork 17
Reasoning about scaling and performance #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is very useful! Can you say what you mean by weak and strong scaling? |
I meant this (from wikipedia):
By this I meant essentially:
My point being that IIUC cubed is promising in that it might have good weak scaling properties even up to very large datasets, and therefore weak scaling is more relevant than strong scaling for analyzing cubed's performance. In other words, as a user I'm excited more by the prospect of being able to analyse extremely large datasets in a reasonable amount of time, as opposed to analyzing medium-sized datasets in a very small amount of time (which is still cool but less important to me). If we had perfectly linear weak scaling, it would mean I could call The rest of my post was about me trying to reason through when and to what extent we might expect to see this linear weak scaling with cubed. |
If this is at all useful then I can write it out in full sentences as a draft docs page perhaps? |
Thanks for the explanation. That would be great! |
I tried to write out some thoughts about cubed's performance and scaling. Perhaps this could form the basis of another page of the docs, similar to how dask has pages on best practices and understanding performance. Obviously a lot of this will change as more optimizations are introduced / if we discover other reasons why scaling does not behave as expected.
Aim of writing this out:
Types of scaling to explain:
Scaling of a single step
Theoretical scaling:
Realistic scaling considerations:
Scaling of a multi-step plan
allowed_mem
is larger (is there a rule of thumb for this??) (see Creating cubed arrays from lazy xarray data #197 (comment))Multiple pipelines
Executor-specific considerations
The text was updated successfully, but these errors were encountered: