A way to serialize/deserialize ndindex #86

telamonian · 2020-09-30T04:29:31Z

Hi all. This seems like a really neat project, and one I'll probably get a lot of use out of. Question: would a way to deserialize/serialize an ndindex be something that's within the scope of this project?

Here's my use case -> jupyterlab/jupyterlab-hdf5#4. Short version:

A user opens an hdf5 dataset in their browser, then specify which 1D or 2D slice of a hyperslab to view using numpy-style slice notation
The jupyterlab-hdf5 frontend ships said slice to its backend in the form of a string
On the backend, I rehydrate/deserialize/whatever the slice string and then hand it to h5py, which also uses numpy-style slices to specify hyperslabs

I'd be more than willing to pitch in and help with this in the form of code/PRs, but I'll probably need some help figuring out how to handle deserialization.

The text was updated successfully, but these errors were encountered:

asmeurer · 2020-09-30T08:15:35Z

What form of deserialization do you have in mind. Currently, the repr form of any ndindex expression always recreates it. I haven't tested it but I imagine pickle should work just fine as well.

telamonian · 2020-09-30T11:58:49Z

My starting point is a string (collected as user input), so pickle isn't going to help (also no way to pickle/unpickle on the browser side, which after all is running js and not python). repr might work. I'll check into that.

To give you a concrete example, if a user opens a 4D dataset, they would then enter a 2D slab specification like:

"13, 10:1000, 0, :"

which will then get passed to the python backend server via a GET request. I've come up with some hacky ways to parse such a string into a numpy-compatible tuple of slices. My best so far:

def parseSlice(sliceStr):
    return tuple((slice(*(int(i) if i else None for i in part.strip().split(':'))) if ':' in part else int(part.strip())) for part in sliceStr.split(','))

which you can then use like

>>> parseSlice("13, 10:1000, 0, :")

(13, slice(10, 1000, None), 0, slice(None, None, None))

but I can't imagine this is a robust approach. I think I would need an actual formal parser of some flavor.

asmeurer · 2020-09-30T19:05:01Z

I see. Parsing strings is not something I had considered, but it could be added. Probably the best way would be to do something akin to ast.literal_eval but extended to support what we need for indices. You'd also have to wrap the slice in some indexing object, but I've also thought that we should have something akin to numpy.s_ that converts __getitem__ into an ndindex object.

telamonian · 2020-09-30T20:08:24Z

something akin to ast.literal_eval

yes, this exactly! Now if I could just use eval on raw user input, all of my problems would be solved :)

So I think a starting point for the parser could be

# minimize parsed ops
from numpy import s_

ast.parse(f's_[{sliceStr}]')

where numpy.s_ could be replaced by ndindex.s_ (or whatever you end up calling it).

Of course then we have to figure out what to do with the resulting abstract syntax tree...

asmeurer · 2020-09-30T20:26:24Z

Yes, exactly. If you look at how ast.literal_eval works, it's pretty simple. https://github.com/python/cpython/blob/17b5be0c0a3f74141014e06a660f1b5ddb002fec/Lib/ast.py#L54. It just recursively walks the ast and checks for known OK nodes. I think we can just copy it, and add Slice nodes. Maybe we could also do some careful checking for Name nodes if we want to allow slice(1, 2) or array([1, 2]). Then we use ast.parse(f'dummy[{input_string}]').body[0].value.slice.

Support masks is harder, because usually people construct masks out of the array itself as an expression, like a[a > 0]. Do you need support for that?

telamonian · 2020-09-30T21:04:55Z

Support masks is harder, because usually people construct masks out of the array itself as an expression, like a[a > 0]. Do you need support for that?

For my purposes? No. In fact, for my application it would be best if anything other than a comma separated sequence of literal index or literal slice caused an exception when you try to parse it. But that's just my own narrow-ish use case.

In terms of the general purpose ndindex pkg, does it make sense to support parsing mask indexes? You're prob in a better position than I am to answer said general question.

telamonian · 2020-09-30T21:12:27Z

Here's a simplified parser implementation based on ast.parse that I hacked together:

import ast
from numpy import s_

def astParseSlice(sliceStr):
    for node in ast.walk(ast.parse(f's_[{sliceStr}]')):
        if isinstance(node, ast.ExtSlice):
            return (x for x in (iors.value.n if isinstance(iors, ast.Index) else slice(iors.lower.n if iors.lower is not None else None, iors.upper.n if iors.upper is not None else None, iors.step.n if iors.step is not None else None) if isinstance(iors, ast.Slice) else None for iors in ast.iter_child_nodes(node)) if x is not None)

Basic idea:

parses sliceStr
walks the nodes of the resulting tree (in no particular order)
stops at the first (only?) ExtSlice node and unpacks it into an iterator. Anything other than an Index or Slice node is ignored and then dropped from the final result

But for a real implementation, I agree that copying and modifying ast.literal_eval makes the most sense. The code for literal_eval is pleasantly concise.

edit

Looking into it further, my implementation above fails for a myriad of inputs, everything from "13, 14" to "8:9", and for interesting reasons. So yeah, we should definitely start from the existing literal_eval (not that that was really in question)

asmeurer · 2020-09-30T22:10:55Z

ndindex is presently designed around the idea of "raw" indices, that don't have any knowledge of the array being indexed. So a boolean mask like a > 0 would be represented in ndindex as just the boolean array, without any knowledge of a. So I would omit support for it for now.

asmeurer · 2020-09-30T22:49:58Z

I think we can copy literal_eval and

Make it use ast.parse(f'dummy[{input_string}]').body[0].value.slice.
Remove ifs for nodes that we know we won't use, like the set and dict.
Add elif isinstance(node, Slice) and elif isinstance(node, ExtSlice). I don't think we need to worry about the depth because we won't allow Subscript which would appear for any nested indexing.
Modify it so it returns ndindex types instead of built-in types.

The function doesn't need to worry about input validation because the ndindex classes already do that.

telamonian · 2020-09-30T23:41:44Z

Nice, that makes it quite clear how to accomplish this. I'll take a stab at a PR

scopatz · 2020-10-05T16:48:29Z

Thanks @telamonian!

telamonian · 2020-10-08T12:58:28Z

Sorry, I got a little distracted by some debugging stuff I promised to add to the "how to develop a jupyter extension" jupytercon tutorial. I'll probably make the PR for this over the weekend

Edit: I've started work

telamonian · 2020-10-12T00:22:07Z

Okay, so I ran into problems with the implementation, and after doing some digging it turns out that the ast.literal_eval function changed a bit in 3.8, and the way that slices are represented in the AST changed significantly in 3.9:

From https://docs.python.org/3/library/ast.html#node-classes:

Changed in version 3.8: Class ast.Constant is now used for all constants.

Changed in version 3.9: Simple indices are represented by their value, extended slices are represented as tuples.

Deprecated since version 3.8: Old classes ast.Num, ast.Str, ast.Bytes, ast.NameConstant and ast.Ellipsis are still available, but they will be removed in future Python releases. In the meantime, instantiating them will return an instance of a different class.

Deprecated since version 3.9: Old classes ast.Index and ast.ExtSlice are still available, but they will be removed in future Python releases. In the meantime, instantiating them will return an instance of a different class.

So, eg the section for handling ast.Num nodes was removed from ast.literal_eval in 3.8. However, the Python installed in my dev env is 3.7. This meant that the the version of ast.literal_eval that I copied-pasted from cpython master was choking on the AST generated by my dev instance of Python (3.7 AST uses ast.Num extensively)

I can definitely come up with something that will work for 3.7, but making a string->index parser that will work with all of 3.6-3.9 will take some extra work. I'll look into it further

telamonian · 2020-10-12T02:48:12Z

turns out to have been much simpler than I initially thought to implement a literal_eval_index function that works in both cpy37 and cpy38. There's a PR up now with a working prototype.

One catch: cpy39 will probably require some more work and explicit handling, since it makes changes to the grammar of ast.Slice. First I'll need to set up pyenv (or something) so I can install cpy39 without snapping my dev env in half.

@asmeurer Currently the PR contains a literal_eval_index.py file with a bare literal_eval_index function that will output either a single bare index or a tuple (dependent on input). I'll need your input on how to actually integrate my literal_eval_index with the existing ndindex package

telamonian · 2020-10-12T11:36:39Z

good news: it turns out the code that I already wrote essentially gets support for the new grammar in cpy39 "for free". ast in cpy39 replaced all uses of ast.Index and ast.ExtSlice with one of: ast.Constant, ast.Slice, or an ast.Tuple containing one or both. So the support I added for ast.Slice (along with the existing support for ast.Constant and ast.Tuple from the original ast.literal_eval) already covers the cpy39 use case.

@asmeurer I could still use some pointers about how to integrate literal_eval_index with the rest of the project

scopatz added the Nice-to-have label Oct 7, 2020

telamonian linked a pull request Oct 12, 2020 that will close this issue

Add parse_index function for parsing NDIndex from string #91

Open

ericdatakelly assigned asmeurer Apr 5, 2021

ericdatakelly added this to the April 2021 milestone Apr 12, 2021

ericdatakelly modified the milestones: April 2021, May 2021 May 10, 2021

ericdatakelly modified the milestones: May 2021, June 2021 Jun 7, 2021

ericdatakelly removed this from the June 2021 milestone Oct 6, 2021

asmeurer removed the Skip report label Nov 16, 2023

asmeurer mentioned this issue Oct 17, 2024

Feature request: str <--> ndindex conversion #196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A way to serialize/deserialize ndindex #86

A way to serialize/deserialize ndindex #86

telamonian commented Sep 30, 2020 •

edited

Loading

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020 •

edited

Loading

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020 •

edited

Loading

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020

telamonian commented Sep 30, 2020 •

edited

Loading

asmeurer commented Sep 30, 2020

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020

scopatz commented Oct 5, 2020

telamonian commented Oct 8, 2020 •

edited

Loading

telamonian commented Oct 12, 2020

telamonian commented Oct 12, 2020

telamonian commented Oct 12, 2020

A way to serialize/deserialize ndindex #86

A way to serialize/deserialize ndindex #86

Comments

telamonian commented Sep 30, 2020 • edited Loading

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020 • edited Loading

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020 • edited Loading

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020

telamonian commented Sep 30, 2020 • edited Loading

edit

asmeurer commented Sep 30, 2020

asmeurer commented Sep 30, 2020

telamonian commented Sep 30, 2020

scopatz commented Oct 5, 2020

telamonian commented Oct 8, 2020 • edited Loading

telamonian commented Oct 12, 2020

telamonian commented Oct 12, 2020

telamonian commented Oct 12, 2020

telamonian commented Sep 30, 2020 •

edited

Loading

telamonian commented Sep 30, 2020 •

edited

Loading

telamonian commented Sep 30, 2020 •

edited

Loading

telamonian commented Sep 30, 2020 •

edited

Loading

telamonian commented Oct 8, 2020 •

edited

Loading