Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading data groups ipx.Read.load can fail on certain files in edge cases #576

Open
jrenrut opened this issue Aug 21, 2024 · 7 comments · May be fixed by #578
Open

Loading data groups ipx.Read.load can fail on certain files in edge cases #576

jrenrut opened this issue Aug 21, 2024 · 7 comments · May be fixed by #578

Comments

@jrenrut
Copy link

jrenrut commented Aug 21, 2024

Summary

I've found an edge case where ipx.Read fails to load a certain file missing a requested variable group.

Description

Background

I used ipx.Query to find data for a limited spatial extend for the entire year of 2023. I then ordered the granules:

query.order_vars.append(['h_li', 'latitude', 'longitude'])
query.subsetparams(Coverage=query.order_vars.wanted)
query.order_granules()
query.download_granules('./data')

I isolate the specific file that's missing one of my desired variable group, 'land_ice_segments'. For a good file:

h5 = h5py.File(r"data\processed_ATL06_20230108204519_02951802_006_02.h5")
gt1l = hf["gt1l"]
gt1l.keys()
>>> <KeysViewHDF5 ['land_ice_segments', 'residual_histogram', 'segment_quality']

For the misbehaving file:

h5 = h5py.File(r"data\processed_ATL06_20230111085249_03331806_006_02.h5")
gt1l = hf["gt1l"]
gt1l.keys()
>>> <KeysViewHDF5 ['residual_histogram', 'segment_quality']

Error

When I go to load my variables, I run into an OSError:

data_path = Path('./data').resolve()
data_files = tuple(data_path.glob('*2023*.h5'))
reader = ipx.Read([str(df) for df in data_files])
reader.vars.append(var_list=['h_li', "latitude", "longitude"])
ds = reader.load()
>>> OSError: [Errno group not found: land_ice_segments] 'land_ice_segments'

The error originates in Read.load() on this line:

all_dss.append(
    self._build_single_file_dataset(file, groups_list)
)  # wanted_groups, vgrp.keys()))

In Read._build_single_file_dataset() on this line:

ds = self._read_single_grp(file, grp_path=wanted_groups_list[0])

Finally the error is thrown here in Read._read_single_grp(), calling xarray.open_dataset):

return xr.open_dataset(
    file,
    group=grp_path,
    engine="h5netcdf",
    backend_kwargs={"phony_dims": "access"},
)

Proposed Solution

Place the call of Read._build_single_file_dataset() in a try/except:

try:
    all_dss.append(
        self._build_single_file_dataset(file, groups_list)
    )  # wanted_groups, vgrp.keys()))
except OSError:
    warnings.warn(
        f"{file} is excluded from this dataset because it"
        "did not contain all wanted variable groups.",
        stacklevel=2,
    )
@JessicaS11
Copy link
Member

Thanks for the great bug report! Out of curiosity - any idea why that granule doesn't contain the desired variable group? In cases like this we like to make sure that it's an edge case that should be handled within icepyx rather than an issue with the data itself.

@jrenrut
Copy link
Author

jrenrut commented Aug 22, 2024

This is my first time interacting with ICESat-2 data, so unfortunately I can't really speak to that but I was wondering about this also. Was there maybe no valid data in this granule, and does that result in the variable group being removed rather than left empty? Not sure.

@mikala-nsidc
Copy link

mikala-nsidc commented Aug 22, 2024

@jrenrut Could you share the bounding box that you used for your spatial query? I'll try to mimic the spatial and variable subsetting that you did to see why that group might have been missing from the processed file.

@jrenrut
Copy link
Author

jrenrut commented Aug 22, 2024

@mikala-nsidc Sure. I couldn't upload a .kml so I left it out originally, but it's easy enough to copy/paste:

<kml xmlns="http://www.opengis.net/kml/2.2"><Document>
<Placemark>

  <Polygon>
<outerBoundaryIs>
  <LinearRing><coordinates>-108.7448855153659,42.52137511207607
-108.95629660653259,42.478405856164756
-109.42207743882035,42.67607422948224
-109.69309543424377,42.9411752885619
-110.02073789421323,43.19898442578465
-110.00903619011682,43.3097777679274
-109.90707513545527,43.38036843185466
-109.66266574886284,43.491721566096665
-109.42482929506421,43.33300738018947
-109.0565174766684,42.93568510669404
-108.88773084723528,42.74920648517508
-108.7448855153659,42.52137511207607</coordinates></LinearRing></outerBoundaryIs></Polygon></Placemark></Document></kml>

@jrenrut
Copy link
Author

jrenrut commented Aug 22, 2024

and then:

shape_file = Path('data/wind.kml')
query = ipx.Query(
    product='ATL06',
    spatial_extent=str(shape_file),  ## icepyx doesn't like Path objects
    date_range=[start_date, end_date]
)

@mikala-nsidc
Copy link

mikala-nsidc commented Aug 22, 2024

Screenshot 2024-08-22 at 4 34 13 PM

So, there aren't data inside your polygon for that granule. But I'm not sure why you got a processed file back at all. I repeated your workflow in Earthdata Search and was able to replicate your results. I also got a file without any land ice segments in it. I would expect the subsetter processing to fail in this case, and give you an error that there are no data within the subset boundaries, and I'm not sure why it didn't fail. We'll have to investigate a bit on our end (NSIDC). I'll let @JessicaS11 weigh in about how/whether to handle a case like this within icepyx.

@JessicaS11
Copy link
Member

We'll have to investigate a bit on our end (NSIDC). I'll let @JessicaS11 weigh in about how/whether to handle a case like this within icepyx.

Thanks for looking into this @mikala-nsidc! Obviously it would be best to handle the cause of this error at the source (why the subsetter is returning any data). However, it would also be great to have some sort of error handling built in to keep read-in workflows from failing... follow the conversation on this aspect over in the PR (#578).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants