-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting a nasty crash with some data which seems to involve dimensions #3
Comments
Diving a little deeper, I think this may be due to a failure in how the dimensions are handled, before there is a problem with the dimension scale. With a simple file import h5netcdf
import pyfive
import h5py
p5 = False. # use pyfive to look at the file
h5 = True # use h5py to look at the file
# (with both false, h5netcdf is used to look at the file
doit = True # look at the dimensions
if p5:
ds = pyfive.File('delme.nc','r')
elif h5:
ds = h5py.File('delme.nc','r')
else:
decode_vlen_strings = {'decode_vlen_strings':True}
print(h5netcdf.__file__)
ds = h5netcdf.File('delme.nc', "r", backend='pyfive', **decode_vlen_strings)
var = ds['time']
var = ds['lat']
if doit:
print('Now do dimensions')
if p5 or h5:
print(var.dims)
else:
print (var.dimensions) We then see that
My working assumption is that because of this, the dimension scale is not properly handled from an h5netcdf point of view, and we might need to implement the REFERENCE_LIST ... |
In a version of
|
From https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html (don't know if it's useful) AttributesAttributes in HDF5 and netCDF-4 correspond very closely. Each attribute in an HDF5 file is represented as an attribute in the netCDF-4 file, with the exception of the attributes below, which are hidden by the netCDF-4 API.
|
I think at this point we need a tutorial on dimension scales, the nearest equivalent to being the series of blog articles by John Caron which start here. I'm gonna excerpt some key bits, but the real oil is there.
(The links in the original were broken, I have added some that point to either what i hope is the original or something similar.) |
John goes on to detail exactly how HDF5 dimension scales work, and then how NetCDF4 uses them. A key part of the latter document seems to make it clear that NetCDF4 does NOT need the values of the REFERENCE_LIST, so implementing them should not be necessary. So the issue at hand would appear NOT to be related to our failure to implement REFERENCE_LISTS in pyfive (or at least should not, so now we have to work out exactly what H5Py is up to). |
Ok, so first, let's address these annoying warnings while we wait for a better test for the actual crash. if v.attrs.get("CLASS") == b"DIMENSION_SCALE": but only when it evaluates as true (because that's the only time there is a REFERENCE_LIST datatype message in the dataset messages). Diving further into the code we see that this @property
def attrs(self):
"""Return variable attributes."""
return Attributes(
self._h5ds.attrs, self._root._check_valid_netcdf_dtype, self._root._h5py
) and the class atributes can be found in (attrs.py 20), and is initialised with So we just want to suppress this warning when accessed as part of this pathway. |
Ok, and as to the actual error, should have done some RTFM, where we find: Datasets with missing dimension scales By default (see below) # mimic netCDF-behaviour for non-netcdf files
f = h5netcdf.File('mydata.h5', mode='r', phony_dims='sort') Note, that this iterates once over the whole group-hierarchy. This has affects f = h5netcdf.File('mydata.h5', mode='r', phony_dims='access') (The keyword default setting is |
And, lo and behold, if we set phony_dims, indeed it runs fine. So now the question is why did this file not have compliant dimension scales given it was written by netcdf4_python? |
this seems relevant, from the docs: NetCDF-4 allows some interoperability with HDF5.Reading and Editing NetCDF-4 Files with HDF5The HDF5 Files produced by netCDF-4 are perfectly respectable HDF5 files, and can be read by any HDF5 application. NetCDF-4 relies on several new features of HDF5, including dimension scales. The HDF5 dimension scales feature adds a bunch of attributes to the HDF5 file to keep track of the dimension information. It is not just wrong, but wrong-headed, to modify these attributes except with the HDF5 dimension scale API. If you do so, then you will deserve what you get, which will be a mess. Additionally, netCDF stores some extra information for dimensions without dimension scale information. (That is, a dimension without an associated coordinate variable). So HDF5 users should not write data to a netCDF-4 file which extends any unlimited dimension, or change any of the extra attributes used by netCDF to track dimension information. Also there are some types allowed in HDF5, but not allowed in netCDF-4 (for example the time type). Using any such type in a netCDF-4 file will cause the file to become unreadable to netCDF-4. So don't do it. NetCDF-4 ignores all HDF5 references. Can't make head nor tail of them. Also netCDF-4 assumes a strictly hierarchical group structure. No looping, you weirdo! Attributes can be added (they must be one of the netCDF-4 types), modified, or even deleted, in HDF5. Reading and Editing HDF5 Files with NetCDF-4Assuming a HDF5 file is written in accordance with the netCDF-4 rules (i.e. no strange types, no looping groups), and assuming that every dataset has a dimension scale attached to each dimension, the netCDF-4 API can be used to read and edit the file, quite easily. In HDF5 (version 1.8.0 and later), dimension scales are (generally) 1D datasets, that hold dimension data. A multi-dimensional dataset can then attach a dimension scale to any or all of its dimensions. For example, a user might have 1D dimension scales for lat and lon, and a 2D dataset which has lat attached to the first dimension, and lon to the second. If dimension scales are not used, then netCDF-4 can still edit the file, and will invent anonymous dimensions for each variable shape. This is done by iterating through the space of each dataset. As each space size is encountered, a phony dimension of that size is checked for. It it does not exist, a new phony dimension is created for that size. In this way, a HDF5 file with datasets that are using shared dimensions can be seen properly in netCDF-4. (There is no shared dimension in HDF5, but data users will freqently write many datasets with the same shape, and intend these to be shared dimensions.) Starting with version 4.7.3, if a dataset is encountered with uses the same size for two or more of its dataspace lengths, then a new phony dimension will be created for each. That is, a dataset with size [100][100] will result in two phony dimensions, each of size 100. |
There is additional information on dimension scales here |
Given this is all solved with a keyword, it seems that it's not really an issue ... for now. |
What happened:
raises the following errror
instead of printing the dimensions.
The text was updated successfully, but these errors were encountered: