Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save arbitrary Python objects to netCDF #1415

Open
lewisacidic opened this issue May 20, 2017 · 5 comments
Open

Save arbitrary Python objects to netCDF #1415

lewisacidic opened this issue May 20, 2017 · 5 comments

Comments

@lewisacidic
Copy link
Contributor

I am looking to transition from pandas to xarray, and the only feature that I am really missing is the ability to seamlessly save arrays of python objects to hdf5 (or netCDF). This might be an issue for the backend netCDF4 libraries instead, but I thought I would post it here first to see what the opinions were about this functionality.

For context, Pandas allows this by using pytables' ObjectAtom to serialize the object using pickle, then saves as a variable length bytes data type. It is already possible to do this using netCDF4, by applying to each object in the array np.fromstring(pickle.dumps(obj), dtype=np.uint8), and saving these using a uint8 VLType. Then retrieving is simply pickle.reads(obj.tostring()) for each array.

I know pickle can be a security problem, it can cause an problem if you try to save a numerical array that accidently has dtype=object (pandas gives a warning), and that this is probably quite slow (I think pandas pickles a list containing all the objects for speed), but it would be incredibly convenient.

@shoyer
Copy link
Member

shoyer commented May 20, 2017

I would be OK with this if it required explicitly setting a keyword argument, e.g., ds.to_netcdf(..., allow_pickle=True) and xarray.open_dataset(..., allow_pickle=True). This could be hooked into xarray's existing coding/decoding layer in a relatively straightforward fashion: see ensure_dtype_not_object for where this is caught in the current code. (We would also need something at a lower level in the netCDF4 specific reader/writer to handle uint8 VLType.)

@lewisacidic
Copy link
Contributor Author

I would certainly be interested in giving this a try, although I'm not exactly sure what would go where yet. It seems like this might possibly be something that would be more appropriate in the netCDF4-python library - should I start an issue over there?

@shoyer
Copy link
Member

shoyer commented May 20, 2017

Sure, there's no harm in asking. My guess is that this isn't a good fit, but I'm not entirely sure.

@lewisacidic
Copy link
Contributor Author

Yeah, looking at it, it's probably not a thing for them. I thought something like:

# implement something like
# strs = nc.createVariable('strs', str, ('strs_dim',))
objs = nc.createVariable('objs', object, ('objs_dim',))

But I see that the str datatype is a netCDF spec type.

@stale
Copy link

stale bot commented Apr 21, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 21, 2019
@shoyer shoyer closed this as completed Apr 21, 2019
@shoyer shoyer reopened this Apr 21, 2019
@stale stale bot removed the stale label Apr 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants