Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change conditional array slice from 1st index to most True values #227

Merged
merged 6 commits into from
Jan 24, 2024

Conversation

danielfromearth
Copy link
Collaborator

@danielfromearth danielfromearth commented Jan 22, 2024

Github Issue: #226

Description

This changes the trimming of the conditional array when subsetting both spatially and temporally, so that valid values in the time variable (for TEMPO data) are not inadvertently lost.

Overview of work done

The logic has been changed. Before, the first index of the missing dimension was selected naively. Now, the conditional array is iterated over to find the row/column that has the most True values and that row/column is selected.

Overview of verification done

Tested this manually with TEMPO data locally. Ran and passed all of the automated tests for the subset module.

Overview of integration done

None.

PR checklist:

  • Linted
  • Updated unit tests
  • Updated changelog
  • Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

@danielfromearth danielfromearth added the bug Something isn't working label Jan 22, 2024
@danielfromearth
Copy link
Collaborator Author

Hi @jamesfwood, looks like there is an issue with dependency versions that Snyk wants addressed. Can you help with addressing this?

@jamesfwood
Copy link
Collaborator

jamesfwood commented Jan 23, 2024

Hi @jamesfwood, looks like there is an issue with dependency versions that Snyk wants addressed. Can you help with addressing this?

Hi @danielfromearth, if you pull in the latest develop into your branch and repush it should fix the issue. It updated pillow version.

If this happens again in the future, you can try to do a poetry update and that will update the dependency versions to fix security issues.

@danielfromearth danielfromearth self-assigned this Jan 23, 2024
@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Jan 24, 2024

Status update: one of the automated tests (tests/test_subset.py::test_cf_decode_times_sndr) just started failing after updating the poetry dependencies. It appears to fail when trying to access values of the xarray.DataArray for a single variable (__local_solar_time) in the SNDR data. The error raised is OverflowError: Python int too large to convert to C long. The variable's attributes are:

{'valid_range': array([ 0., 24.], dtype=float32),
 'long_name': 'local apparent solar time',
 '_FillValue': 9.96921e+36,
 'coverage_content_type': 'referenceInformation',
 'description': 'local apparent solar time in hours from midnight'}

Also note, this error is raised for the granule, SNDR.AQUA.AIRS.20140110T0305.m06.g031.L2_CLIMCAPS_RET.std.v02_39.G.210131015806.nc, but not this granule:
SNDR.J1.CRIMSS.20210224T0100.m06.g011.L2_CLIMCAPS_RET.std.v02_28.G.210331064430.nc.

@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Jan 24, 2024

A further update on this failing test: Looks like the /local_solar_time variable — in the tests/data/SNDR/ granule that is raising the OverflowError — contains only/all null values...

snapshot_sndr_aqua_g031

This seems like it might be a problem with this data file, rather than the subsetter. @jamesfwood, @nlenssen2013, @sliu008, thoughts?

@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Jan 24, 2024

Update:
Well, the additional fix I just committed — which extends the xarray.open_dataset()'s mask_and_scale application to more time variables (and including float32 as well as float64) — seems to have worked, as it replaces null values with scaled fill values to avoid the Overflow error. So, perhaps we can ignore the null time values in this test data file for now. Nevertheless, the SNDR team may still want to confirm whether those null values in the local_solar_time array are expected.

@danielfromearth danielfromearth merged commit 32df4dd into develop Jan 24, 2024
2 checks passed
@danielfromearth danielfromearth deleted the feature/issue-226-null-time-values branch January 24, 2024 22:42
@danielfromearth
Copy link
Collaborator Author

Thanks @sliu008 for reviewing this too!

@jamesfwood jamesfwood mentioned this pull request Jan 25, 2024
@danielfromearth danielfromearth restored the feature/issue-226-null-time-values branch January 25, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

null time values in TEMPO results when spatial+temporal subsetting
3 participants