Replies: 5 comments 10 replies
-
1. Expand the API to include metadata and chunk specific methodsThe idea would be to add a analog set of methods for getting/setting/listing metadata documents. Rather than the store always getting or setting metadata as strings of bytes (serialized JSON), we would start passing metadata objects to the store in the form of a dictionary: For example, the async def metadata_get(self, key: str) -> dict[str, Any]:
... This is perhaps the biggest change in my proposal but there are some clear analogs to what Zarr-Python supports today with separate I’ll also note that this change is aligned with how [spec](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#abstract-store-interface) envisioned stores evolving:
The |
Beta Was this translation helpful? Give feedback.
-
2. Support additional metadata-only operations in the store.Expanding on the ideas in (1), we could also ask the store to do more with metadata keys. The prime example here is generating the hierarchy to support async def metadata_tree(self, root: str) -> dict[str, dict]:
... The rational for this method is that database type stores are going to be able to produce the tree without iterating through the entire store. The proposed API here would be opt-in with a default implementation in the base store. |
Beta Was this translation helpful? Give feedback.
-
3. Require stores to be opened using a read, write, or append modes.Enforcing the mode of operating with a store will allow us to be far more protective against bad behaviors and less universally defensive. It will also open to the door to store-native caching which I am going to save for a separate proposal. @classmethod
async def open(cls, path='', mode='r', **kwargs) -> Store:
... |
Beta Was this translation helpful? Give feedback.
-
4. Return async generators from all
|
Beta Was this translation helpful? Give feedback.
-
5. Support partial reads/writes on single and multiple keys at onceThe initial design did not allow for a partial read of a single object using the async def get(self, key: str, byte_range: Optional[Tuple[int, Optional[int]]] = None) -> Optional[BytesLike]:
... In fact, this could be the only required interface and the bulk fetch method could be implemented by default as something like: async def get_partial_values(self, key_ranges: List[Tuple[str, Tuple[int, int]]]) -> List[bytes]:
awaitables = [self.get(key, byte_range=byte_range) for key, byte_range in key_ranges]
return asyncio.gather(awaitables) Stores that are able to provide optimizations beyond this brute force approach (e.g. coalescing) could implement this method. |
Beta Was this translation helpful? Give feedback.
-
In #1583, we laid out a rough design for the Store API for Zarr-Python 3.0. This discussion proposes some changes to that design and includes a new Store ABC. After some iteration here, I plan to update the design doc (link below).
Background
In the next major version of this library, we’re leaving behind the generic mutable-mapping store interface in favor of a more opinionated store API. In the 3.0 design doc, we outlined a basic API based on the v3-spec. Key changes relative to the prior mutable mapping interface include:
getsize
andrename
Today we have implemented this API for the MemoryStore, LocalStore, and @kylebarron has even tried it out with the
object-store-python
project (#1661).Proposal
Now, after a few weeks of prototyping with this API, I am proposing some changes. I’m going to open each of these as separate threads below to help focus discussion.
Beta Was this translation helpful? Give feedback.
All reactions