Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared read-only state between objects with copy on write #93

Open
jimfulton opened this issue Sep 26, 2018 · 5 comments
Open

Shared read-only state between objects with copy on write #93

jimfulton opened this issue Sep 26, 2018 · 5 comments
Labels

Comments

@jimfulton
Copy link
Member

There's a lot of interest in using ZODB with asynchronous frameworks, especially for applications that block on network requests to services. From a purely programming perspective, gevent makes this quite tractable, but the cost of maintaining many open ZODB connections with their own caches is a major challenge. The cost of maintaining many open connections could be mitigated if data could be shared among their caches.

One way to do this would be to have a shared state cache of read-only state objects. Consider the extremely common case of persistent objects that store their data in dictionaries (and leaving aside non-persistent subobjects, for the sake of discussion). Set-state for such objects could simply assign the instance dictionary to the state. First assigning an attribute to such an object could copy the state dict first. This would allow use of shared immutable state dicts, requiring no copying for read-only operations. Note that in this scenario, only state is shared, not persistent objects.

You could use slots, or secondary dictionaries for non-shared mutable state.

Similar schemes could be used for BTrees and Buckets, although we'd need to introduce new Python subobjects to represent shared state.

To make this work, we'd likely want to create persistent subobjects that disallowed storing non-persistent mutable subobjects, which would have other benefits.

@jimfulton jimfulton added the idea label Sep 26, 2018
@jamadden
Copy link
Member

This is somewhat similar to RelStorage's in-memory pickle state cache, which is shared by all Connections of a Storage, but operating on the unpickled data (and then of course copying it). I like the idea!

A challenge there is making such a shared cache effective with the different MVCC states that each Connection may be seeing. RelStorage has a complicated system of "checkpoints" it uses to accomplish this that works OK for short-lived transactions and Connections that don't drift too far apart from each other in terms of their MVCC state.

@jimfulton
Copy link
Member Author

This cache would be keyed by oid + serial, so it would be orthogonal to MVCC. It would store Python objects, so there would be no additional deserialization overhead. Because the sharing would be at the object level, there would be memory savings, not just savings in loading object objects.

@jimfulton
Copy link
Member Author

If we could store non-dicts as __dict__, then we could use immutable dicts as shared state and trigger copy on failed setitem (or on noticing non-dicts), requiring no change to persistent state metadata.

@jamadden
Copy link
Member

This cache would be keyed by oid + serial, so it would be orthogonal to MVCC.

Ah, I see. It helps that the current laughingly-misnamed "pickle cache" knows what (oid, serial) values it's going to be requesting; the RelStorage case just has to deal with arbitrary requests over time.

@davisagli
Copy link
Member

Nice idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants