-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: control attrs better as described in issue #3277 #3344
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is both a nice API design and it's the first good use of weak references that I've seen. These are intended to be short-lived objects that need to refer back to the ak.Array
that made them, and creating a cyclic reference (putting the array into a list of things that won't be deleted until garbage collection) just for that doesn't make sense.
In fact, there's one more proxy like this: the ak.Array.mask
property makes a proxy like this, for applying a slice as ak.mask
instead of __getitem__
. The fact that masking creates cyclic references might be why some physics analyses are using more memory than others, in a way that would surely be surprising/unexpected.
Oh, I forgot to mention: I don't see anything here that enforces the attribute type or immutability. I like the mutability in this object, though: seeing how this can be used, I'm convinced that my_array.attrs["something"] = 123 is a good thing. About the type: this PR enforces the key type to be strings, but it doesn't enforce the value type to be anything in particular, right? Persistent attributes would eventually need to be JSON or pickleable to be saved in files, perhaps Parquet files or pickles, but the transient attributes don't need to be restricted. There might be good reason to let a transient attribute be an unpickleable lambda function. |
I can have a look at
No, the types have never been enforced, it's just type annotations. With and without this PR the types can be anything that a Mapping would accept (key: Hashable, value: Any). I don't know how the serialization of attrs works. Shouldn't the serialization step just try to serialize attrs, and if it can't drop a warning and save the arrays without attrs? (I'd expect something like this as a behavior) edit: from a brief look at the |
I mentioned immutability and controlling the type because that's what the issue (#3277) was saying, although we can certainly change our minds on that. Ah! "Controlling the type" meant ensuring that it's a dict. It doesn't say anything about other interpretations:
I'd rather not raise a warning when failing to serialize. This is a general problem with warnings in Python: user A of package B that depends on package C that depends on Awkward will see the warning and (1) not know what it's for, since it's something internal to B or C, and (2) not get a stack trace to report. It could be an exception or silently skip writing the attribute, but there are more problems with silently skipping. I just checked For persistent value types, if it's not serializable, I guess it raises an exception now, right? Yes, both before >>> a = ak.Array([1, 2, 3])
>>> a.attrs["yikes"] = lambda x: x
>>> pickle.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <function <lambda> at 0x752e88cd9a80>: attribute lookup <lambda> on __main__ failed and after >>> a = ak.Array([1, 2, 3])
>>> a.attrs["yikes"] = lambda x: x
>>> pickle.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <function <lambda> at 0x791809fd5a80>: attribute lookup <lambda> on __main__ failed this PR. I think everything above is in a good state and this PR is ready to merge. |
Yes, exactly, that's the policy right now. Maybe it does make sense to enforce at least the key type to be a string because otherwise this is a bad error: a = ak.Array([1, 2, 3])
a.attrs[1] = lambda x: x
pickle.dumps(a)
# > AttributeError: 'int' object has no attribute 'startswith' because the pickling step checks for transient keys first. I have an idea: What do you think about this? |
That makes sense to me. The current rule for identifying a transient is that it starts with an "@"; the new rule would be that it's a string and starts with an "@" (which was kind-of implicit in the current rule). We do want people to only make |
Co-authored-by: Angus Hollands <[email protected]>
This changes the behavior of
attrs
as described in #3277. This PR closes #3277.An example that works with this PR is:
Because of this change (and that we have now a
Attrs
class as proposed in #3277) I had to changeis
to==
in some of the tests.