-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace parallelized with allowed #221
Conversation
can you run |
xskillscore/core/deterministic.py
Outdated
@@ -137,7 +137,7 @@ def pearson_r(a, b, dim=None, weights=None, skipna=False, keep_attrs=False): | |||
weights, | |||
input_core_dims=input_core_dims, | |||
kwargs={"axis": -1, "skipna": skipna}, | |||
dask="parallelized", | |||
dask="allowed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still in favor of having dask
as a global keyword, see xarray.set_config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~/Coding/xskillscore/asv_bench$ asv continuous -f 1.1 upstream/master HEAD -b deterministic.Compute_small.time_xskillscore_metric_small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ 70.83%] ··· ================================================ ========
m
------------------------------------------------ --------
<function rmse at 0x7fe9d0353730> failed
<function pearson_r at 0x7fe9d032c400> failed
<function mae at 0x7fe9d0353840> failed
<function mse at 0x7fe9d03537b8> failed
<function pearson_r_p_value at 0x7fe9d0353378> failed
================================================ ========
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do easy checks with asv dev -b deterministic.Compute_small.time_xskillscore_metric_small
, must delete that encoding it seems
I couldn't get as running.
…On Wed, Nov 4, 2020, 1:49 AM Aaron Spring ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In xskillscore/core/deterministic.py
<#221 (comment)>
:
> @@ -137,7 +137,7 @@ def pearson_r(a, b, dim=None, weights=None, skipna=False, keep_attrs=False):
weights,
input_core_dims=input_core_dims,
kwargs={"axis": -1, "skipna": skipna},
- dask="parallelized",
+ dask="allowed",
I am still in favor of having dask as a global keyword, see
xarray.set_config
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#221 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADU7FFUM7KU3CWRL4MZL3LDSOEBRBANCNFSM4TJOVA5Q>
.
|
Can you take a look at the issue page where I copied my error? it says my
cache failed or something
…On Wed, Nov 4, 2020, 9:06 AM Aaron Spring ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In xskillscore/core/deterministic.py
<#221 (comment)>
:
> @@ -137,7 +137,7 @@ def pearson_r(a, b, dim=None, weights=None, skipna=False, keep_attrs=False):
weights,
input_core_dims=input_core_dims,
kwargs={"axis": -1, "skipna": skipna},
- dask="parallelized",
+ dask="allowed",
~/Coding/xskillscore/asv_bench$ asv continuous -f 1.1 upstream/master HEAD
-b deterministic.Compute_small.time_xskillscore_metric_small
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADU7FFSDEUNEFCJ5HJZ3363SOFUYXANCNFSM4TJOVA5Q>
.
|
|
|
If I'm not mistaken, the first ASV benchmark was using my allowed branch, the second ASV benchmark was using the upstream master. Is this correct? |
I think so, yes. |
It means its faster? |
It only shows benchmarks that have a 10% (1.1) performance change. Here PERFORMANCE DECREASED maybe because of the failing but from reading the table I would say that memory consumption was decreased significantly but no timing improvement |
Can we infer from this that all our functions should rather use allowed? Unsure. |
I think so.
…On Sun, Nov 8, 2020, 5:04 AM Aaron Spring ***@***.***> wrote:
https://stackoverflow.com/questions/51736172/whats-the-difference-between-dask-parallelized-and-dask-allowed-in-xarrays-app
Can we infer from this that all our functions should rather use allowed?
seems so.
Is this also true for the probabilistic functions?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADU7FFTEEEPD2X4SCHD27BLSOZ3M5ANCNFSM4TJOVA5Q>
.
|
Got environment errors. I will try tomorrow again |
I only get |
I've been testing my branch on a HPC and a huge dataset with shape (2, 4, 336, 361, 3, 720, 1, 5, 26). No hard metrics, but it seems to be much faster because when parallelized it has to distribute the data across cores and use significantly more memory (especially since weighted skipna triggers eager computation)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go ahead with this PR. sounds all reasonable. sorry for blocking this. hoped to get a nice asv
benchmark for this but failed
Actually hold on. Let me do more testing; I think passing weights + skipna alongside dask='allowed' is causing it to error out. |
Sorry, just getting to this now. So my understanding is |
Yes would be good to be sure that that works! I am also in favor of an @aaronspring would be good to do the @lukelbd has quite the extensive config system as well at |
resolving tests and with the xarray discussion answer I think we are good with this PR. |
Had to revert some probabilistic metrics back to parallelized because properscoring probably has some non-numpy calls. |
No idea why all the errors are now all back. Maybe merge the no_compute first then deal with this. |
What’s the status here? Didn’t we all agree that allowed was better? |
No idea what's wrong with the lint stuff |
Failing at
|
Moved the dask keyword input to a variable. Setting it to "parallelized" and the tests run fine. Similar to probabilistic it may not work with all functions. |
Probably have to play whack-a-mole to see which functions work with "allowed" and then benchmarks those that do against "parallelized" |
Just making notes here. Deterministics Metrics that don't work with allowed
Deterministics Metrics that do work with allowed (and parallelized)
Probabilistic Metrics that don't work with allowed
Probabilistic Metrics that do work with allowed (and not parallelized)
Contingency Metrics that do work with allowed (and not parallelized)
|
Next steps benchmark the four deterministic metrics. |
I thought RMSE did work? |
The tests fail |
Weird. I would expect it to work since it's only like a line of numpy code
…On Fri, Jan 15, 2021, 2:44 PM Ray Bell ***@***.***> wrote:
I thought RMSE did work?
The tests fail
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADU7FFVHFMVI2BQUPZ5WH3LS2CSKNANCNFSM4TJOVA5Q>
.
|
Yeah. Looking at the metrics above. I can't make sense of what works and what doesn't. |
@raybellwaves, how does it fail? Do you get an error message that you can share? Using I'm very late to the party here, but I do think this is a worthwhile exercise. There is added overhead when running |
Thanks for chiming in @dougiesquire. I believe you can find the errors in the logs of the runs (https://github.com/xarray-contrib/xskillscore/pull/221/checks?check_run_id=1696533136) but there's a lot there. Best to fork Andrew's repo and dabble. Just changing rmse (https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/core/deterministic.py#L843) to "allowed". now run the test
It fails on I'll summarize below
So it fails during the condition of has_nan, skipna and weight_bool. i.e. has nans, skip then and also weight the calculation. Moving up the Traceback
This is failing here where populating the weights array with nans (https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/core/np_deterministic.py#L46) What's interesting is there is warning here
Run the test that fails by hand (WIP)
|
Created #245 |
Thanks @raybellwaves. Taking a closer look. Interestingly, That is, xs.rmse(a_dask, b_dask, dim, weights_dask, skipna=True) works fine, but xs.rmse(a_dask, b_dask, dim, weights, skipna=True) fails with the IndexError you saw above. Still investigating why this is. |
Okay, so there seems to be an issue with boolean indexing a numpy array using a dask array. This happens in A simple example of the issue: import numpy as np
import dask.array as da
data = np.random.rand(10, 10)
# This works
idx = data < 0.5
data[idx] = np.nan
# This produces an IndexError
idx_dask = da.from_array(idx)
data[idx_dask] = np.nan This is probably the expected behaviour for dask/numpy. Certainly, indexing a dask array with numpy boolean indices also fails, though in this reverse case the error message is more interpretable. We could add a catch into Or I could open an issue with the dask dev and see what they say? |
I think |
What about this PR? Any chances of merging? @raybellwaves @ahuang11 |
Feel free to move forward with it; if you want to make a new PR to facilitate that. |
Just started taking a look a this now and I'm struggling a little to work out what is going on (it doesn't help that all the test logs have expired). As I've said above, I think the move to What do folks think about closing this PR and me opening another issue listing all uses of |
yes please do! @ahuang11 said he was happy for us to close and implement. I agree. I think we can expose the dask arg in all our functions. Currently a user (or us) can't change it at run time and it's hard coded to parallelized. When we expose it we can benchmark both methods then we can choose the best default value. @aaronspring is pretty savvy as asv if any help is required with benchmarking. Could start with one metric: https://github.com/xarray-contrib/xskillscore/blob/main/xskillscore/core/deterministic.py#L800 |
Closing and moving conversation to #315 |
See benchmarks in comments:
#207