Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot set gc=False on Generic structs? #631

Closed
HansBrende opened this issue Jan 18, 2024 · 5 comments · Fixed by #635
Closed

Cannot set gc=False on Generic structs? #631

HansBrende opened this issue Jan 18, 2024 · 5 comments · Fixed by #635

Comments

@HansBrende
Copy link

HansBrende commented Jan 18, 2024

Question

If I do

class MsgspecEntity(msgspec.Struct, Generic[P], gc=False):

I get the following error:

ValueError: Cannot set gc=False when inheriting from non-struct types

I suppose I can work around this by dynamically redefining the struct for each possible P type, to avoid using Generic, but is this expected? It would be easier if Generic were excluded from the above restriction.

@jcrist
Copy link
Owner

jcrist commented Jan 21, 2024

Thanks for opening this! We were originally being a bit stricter than necessary here. The real limitation is types with gc=False must be __slots__ classes, so any mixin type (like Generic) must also define __slots__ = (). With #635 you should be able to set gc=False on generic structs as well.

In [1]: from typing import Generic, TypeVar

In [2]: from msgspec import Struct

In [3]: P = TypeVar("P")

In [4]: class Demo(Struct, Generic[P], gc=False):
   ...:     x: P
   ...:     y: P
   ...: 

In [5]: d = Demo(1, 1)

In [6]: import gc

In [7]: gc.is_tracked(d)
Out[7]: False

Standard note - messing with the gc kwarg is considered "advanced usage", I trust you've read all the warnings in the docs before using it :).

@HansBrende
Copy link
Author

HansBrende commented Jan 21, 2024

@jcrist thanks for the fix!

I have read the documentation on that, however, I'm confused on one point:

Why would any struct that participates in deserialization not be a good candidate for gc=False? As we know, when deserializing JSON to a normal dict, it is impossible that that JSON is self-referencing. I.e., you can't have a thing inside itself simply because that is impossible to represent as JSON! So any reference cycles for any of these objects participating in deserialization would by nature have to be created manually in the post_init or subsequent stages. So as long as I am not "adding a thing to itself" post-init, and these structs originate from JSON, I should be totally safe for gc=False.

Or am I missing something?

@jcrist
Copy link
Owner

jcrist commented Jan 21, 2024

No, that's accurate. Custom types supported by dec_hook could result in cyclic behavior, but in general it's unlikely for the result of a decode call to have any cycles. But code constructing these objects outside of decode could still result in a cycle. The warnings are mostly to let users know "here be dragons" and to deter them from mucking with the gc unless a benchmark shows it matters. That you can properly reason about cyclic object structures and python's GC implementation means you are probably capable of judging whether disabling it on these has consequences for your code :).

@HansBrende
Copy link
Author

@jcrist awesome! One thing I did notice during my benchmarks is that gc=False is somewhat undermined by the presence of UUID fields... for some reason python thinks it should track UUIDs even though they are immutable and only contain a couple underlying primitives. I tried to find something on how to "untrack" UUIDs... but was unsuccessful... so I ended up just disabling garbage collection altogether until all my objects are destroyed anyways by refcounting.

@jcrist
Copy link
Owner

jcrist commented Jan 21, 2024

python thinks it should track UUIDs even though they are immutable and only contain a couple underlying primitives

In CPython any type implemented in pure python is a GC type. Since uuid.UUID objects aren't extension types (i.e. they're implemented in python) then they're automatically GC types. If uuid.UUID types were implemented as extension types then you're correct, they wouldn't need to be a GC type.

One option if you'd rather disable GC on the type instead of globally - If you don't ever manipulate the UUIDs as uuids you might try annotating those fields as str instead (possibly with a pattern regex for matching uuids if you're concerned about invalid uuids getting in). Strings are immutable non-GC types. That said, for large payloads the overhead of turning on/off the gc per decode call should be minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants