-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daskified unyt arrays #185
Conversation
Just a heads up that the tests are currently broken and need to be migrated to github actions. So far I haven’t summoned the will to figure out github actions. Any help on that front from anyone reading this is much appreciated. |
Looks like a few more minor fixes needed after that merge... should be straightforward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall great work ! here are a bunch of comments, suggestions and questions.
@@ -176,3 +176,37 @@ def use(self): | |||
|
|||
|
|||
_matplotlib = matplotlib_imports() | |||
|
|||
|
|||
class dask_imports(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note that I refactored the optional imports to be lazy in #250
Even though there should be no incompatibility with this current implementation, I note that importing dask eagerly has a very noticeable impact on import time (it takes longer than importing sympy). We can resolve that at any point before the next release but I suppose it'd be easier to just keep a releasable state at all times, so it'd be preferable to deal with #250 first
@chrishavlin I checked that merging this with #200 didn't cause any of their respective tests to fail so we can push forward with this PR now. |
oh, oops. looks like the minimum dask version for testing caused some problems with 3.8. Let me relax that a little... |
also, I realized that pickling won't actually work quite right... I spent some time trying thinking that it'd be trivial, but it's not. At this point I'd rather leave it to a followup (will open an issue in a moment) |
Ok, @neutrinoceros I think the current version addressed all your comments. And I've opened issues to track the long comment/note on dask reductions as you suggested as well as an issue to track the pickle issue. IMO the pickle issue isn't worth holding up a merge, but if anyone disagrees I'll of course defer. In any case, I'm planning to take a closer look in the next few days -- I'm hoping that it is actually an easy fix and I just wasn't seeing it in my initial attempt today... |
Oh wait -- I think the one unresolved thing here is whether #250 should go in before this? I'm fine if that's the case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the pickle issue isn't worth holding up a merge, but if anyone disagrees I'll of course defer. In any case, I'm planning to take a closer look in the next few days
I agree; we can disclose the issue in the next release notes if necessary.
I think the one unresolved thing here is whether #250 should go in before this?
Not necessarily, I can include the change in my PR too if this one goes in first.
I would merge immediately but I'm going to hold off for the day in case there's anything left to discuss that I missed, or in case you want to make another attempt to fix the pickling issue.
If nothing comes up in the next 24h I'll just merge :)
figured out the pickling! turns out I just needed to read the pickle docs again rather than try to do it from memory... will push up by the end of today. |
This PR introduces the
unyt_dask_array
class, which implements a subclass of standard dask arrays with units attached. Still a work in progress, but it is generally useable now!Basic usage (also shown here in a notebook) begins by using the
unyt_from_dask
function to create a newunyt_dask_array
instance from a dask array:The array can be manipulated as any other dask array:
If the return is an array, we get a
unyt_array
instead:Unit conversions:
The implementation relies primarily on decorators and a hidden
unyt_dask_array._unyt_array
to track unit conversions and has very minimal modifications to the existing unyt codebase. If a user is running a dask client, then all the above calculations will be executed by that client (see notebook), but the implementation here only needs the dask array subset (i.e.,pip install dask[array]
).Some remaining known issues:
_on_demand_imports
but haven't added it to the testing environment yet, so new tests will failNote on the issue with dask reductions:
If you do:
You get a
unyt_quantity
as expected:unyt_quantity(0.50002407, 'm')
But if you use the daskified equivalent of
np.min(ndarray)
:You get a plain float:
0.50002407
. This isn't much of an issue for simple functions likemin
, but many more complex functions are not implemented as attributes. Not yet sure what the best approach is here...Update (8/24) to the dask reductions: I've played around with many approaches focused around manually wrapping all of the dask reductions but have decided that the added complexity is not worth it. Instead, I added a user-facing function,
unyt.dask_array.reduce_with_units
that accepts a dask function handle, the unyt array and anyargs
andkwargs
for the dask function that internally wraps the dask function handle to track units.standalone package?
One final note: while I've been developing this to be incorporated into unyt, the fact that there are very minimal changes to the rest of the codebase means that this could be a standalone package. Happy to go that route if it seems more appropriate!