-
Notifications
You must be signed in to change notification settings - Fork 93
feat: add ak.enforce_type
#2365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
Additional details and impacted files
|
It's looking good so far! |
fcea2d2
to
d3adb09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read over the tests, the main public function API, and the coding style within ak.enforce_type
, and everything looks very nice! Let me know if there are particular points you want me to evaluate in detail.
@jpivarski fab! Thank you, it's a lot of code. I'm heading off until tomorrow; I plan to revisit the rules for records and tuples. Given that record fields can't be renamed with this function (or fields added/removed), record → record shouldn't need to compute permutations. Similarly, if we require that tuples can't be reordered), we don't need permutations there. I think removing these d.o.f should simplify things. |
So, the original implementation of this function permuted record contents, and required that the permutation had matching fields if both layout and type were tuples. I don't think re-ordering records makes sense for this function; that's a fundamentally different operation at which we can only guess by convertability. So, now this function respects fields/positions. |
@jpivarski the motivation for the "thunk" pattern was to separate convertibility tests from the actual conversion logic. However, the only practical need for this was for the branches that occured with unions and records. In the above commits & comments, you can see that I decided to make records and unions more strict; records cannot convert between tuple-record, and unions must now be strictly equal to the type in all "existing" contents, unless the total number of contents does not change (in which case, up-to one content can differ in order to be converted). Regarding these changes for unions, it makes reasoning about them far simpler; we only have to consider type equality, rather than type enforceability. As such, the need for a two-phase conversion is gone, and this logic is much easier to write as a direct recursive function. I found the need for a Now, an existing example of where we keep the separate "can I do X" and "do X" logic is the the content merging protocol, which is defined over
whereas in our case, if something isn't enforceable to So, I'm happy that this code no longer separates those two code pathways. N.B., to be explicit I'm not talking about "thunks" vs two separate functions, rather I'm referring to the need to have "can I do X" at all. |
@jpivarski this is converging on a final implementation. I'd benefit from the following:
|
8d94ae8
to
8b3bc04
Compare
Just for wording, a layout has a type, but it can't be equal to a type. It's not a big deal for the internal function, but would be for an external API. If an array has both I think we should not expose this as a public API unless it's needed for some application. Before we get to this one, dask-awkward and Coffea are reaching into L3 functions, and those are functions with demonstrated need to be L2.
A few:
Are they necessarily projected out? If the node above the IndexedArray and the node below the IndexedArray do not need to be changed (determined via types only, not values), how about if the IndexedArray is left as-is? Such a determination can only be made on the bottom-up return-pass of the recursion (i.e. after recursing down, when you're on the way back up again, you'll be able to tell that the IndexedArray's The reason that's valuable is because it minimally touches RecordArrays with many fields; maybe most of them are unaffected. The RecordArray may be within an IndexedArray (a common case).
Great!
Great!
Maybe fields of a tuple can be dropped if they're at the end (tuples get shorter). That might be expected, though I can't think of any use-cases in which it would matter. For record→record, it can be converted into a subset of fields, right? Also for record→record, can it be converted into a superset of fields if the new fields are option-type? That would be a useful feature.
Great!
What about
We already have
By "removing EmptyArray" (outside the context of a RecordArray field), do you mean "convert EmptyArray into any type"? That's what I would expect: just create a zero-length array from the Form and replace the EmptyArray with that. |
Records and tuples can drop fields, let's add this. I didn't notice that we can introduce none's for new contents - that would be a good feature too! I hadn't originally thought it would be sensible / possible. Currently we touch all indexed nodes if we walk into the layout, and we walk into all nodes to set parameters. We could perhaps early exit if a type matches, but that would be double walking the tree at each level. |
Double-walking is better than touching, since double-walking is metadata-only and touching means that a whole column will be loaded. Setting parameters shouldn't touch any buffers; that's a different part of the The usefulness of adding new record fields that are all
So I think this would come up quite a bit. |
OK @jpivarski this is now nearly done. I've summarised the rules in the docstring. Since your last review, I've:
If you get 5 minutes, I'd appreciate if you could play around with some conversions to ensure it works :P On my remaining todo list:
Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I heartily approve it for merging!
You said "nearly done," so I'll let you decide when to merge it.
I've now finished with this for now. There are no typetracer tests, but that is because the layout equality checks need to be extended to handle this. I'll do so in a follow up PR. |
Closes #1805, closes #2338 (?) by adding a function
ak.enforce_type
that converts an existing layout to a new high-level type. The following rules apply:EmptyArray
) can: