Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nodata retrieval logic to ensure compatibility with rioxarray #192

Merged

Conversation

lbferreira
Copy link
Contributor

Currently, if we set nodata using rioxarray, odc-geo fails to retrieve the nodata value.
This PR changes nodata retrieval logic to ensure compatibility with nodata values set using rioxarray.

Before the proposed changes, we have the problem shown in the screenshot below. After the proposed changes, no errors were observed.
Before fix
Screenshot 2024-12-09 144556
After fix
Screenshot 2024-12-09 144855

@Kirill888
Copy link
Member

@lbferreira thanks for this, but I'd prefer a different fix here, instead of nodata == () check it really should be nodata is (), this will deal with "nodata is set to a single-element array issue". I dislike if k in A: do things with A[k] pattern as it's not thread-safe.

I also think rioxarray should make sure to convert nodata value to a python float and not use numpy.float64.

Another nitpick: when adding commits please use following template:

Summary text no more than 76 chars (important)

expanded explanation if needed. Ideally formatted to not have long lines

Having commit message as a long single line text makes it much harder to review changelog.

@Kirill888
Copy link
Member

Kirill888 commented Dec 10, 2024

In your specific case you can work-around the current issue by using float("nan") instead of np.nan. Like:

da.odc.nodata = float("nan")
# OR
da.rio.write_nodata(float("nan"), encoded=False, inplace=True)

@lbferreira can you please report what version of numpy and rioxarray is this and what is dtype of da array in your example , I can't seem to be able to reproduce the issue locally.

EDIT: it's a numpy 2 issue triggered by use of np.nan instead of float("nan").

@lbferreira
Copy link
Contributor Author

lbferreira commented Dec 10, 2024

@Kirill888 Sorry for the long commit message.
I am using the following lib versions:
odc-geo: 0.4.8
numpy: 2.2.0
rioxarray: 0.18.1

Using float("nan") hasn`t solved the problem. Actually, any value that I set as nodata using rioxarray (int or float) is not correctly retrieved by odc-geo. I am providing a full example below. However, replacing the operator "==" with "is" solves the problem, as you have mentioned. Thus, I will make this change. Please let me know if it looks good to you.

Full example

import xarray as xr
import numpy as np
import rioxarray
import odc.geo.xr

# Create a fake dataarray
da = xr.DataArray(
    np.random.rand(10, 10).astype(np.float32),
    coords={"x": np.arange(10), "y": np.arange(10)},
    dims=["y", "x"],
)

da.rio.write_nodata(float('nan'), encoded=False, inplace=True)
print(da.rio.nodata)

print(da.odc.nodata)

Output
Screenshot 2024-12-10 084203

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.44%. Comparing base (796a99c) to head (f1a90dd).
Report is 2 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #192   +/-   ##
========================================
  Coverage    95.44%   95.44%           
========================================
  Files           31       31           
  Lines         5534     5535    +1     
========================================
+ Hits          5282     5283    +1     
  Misses         252      252           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Kirill888
Copy link
Member

@Kirill888 Sorry for the long commit message.

no worries, we can fix it. To replace last three commits with one please do:

git reset --soft HEAD~3

Then you can make new commit instead of the last three with whatever tool you use. You can then push it to github with:

git push --force-with-lease

to replace last three commits with one.

@rosepearson
Copy link

Glad to see this is being addressed! Causing me problems as well!

@Kirill888
Copy link
Member

@rosepearson this is numpy>=2 issue, it will take some time for numeric Python ecosystem to stabilize around new numpy I reckon. I would recommend adding numpy<2 constraint to your environments for now.

@lbferreira lbferreira force-pushed the rio_nodata_compatibility_fix branch from ac2ff81 to ab46b06 Compare December 11, 2024 14:28
@lbferreira
Copy link
Contributor Author

@Kirill888 All done!

@lbferreira lbferreira changed the title Refactor nodata retrieval logic in ODCExtensionDa class to ensure com… Fix nodata retrieval logic to ensure compatibility with rioxarray Dec 11, 2024
@lbferreira
Copy link
Contributor Author

@Kirill888 I have noted that after this fix, when you have errors, the stack trace comes with a warning... Although it is only a warning, it`s a little annoying. Do you have any suggestions to handle this? In the example below a forced an error, just to show the warning related to nodata...
Screenshot 2024-12-11 105534

@lbferreira lbferreira requested a review from Kirill888 December 11, 2024 17:09
@Kirill888
Copy link
Member

@lbferreira thanks, yep, that's a fair warning. We are using () as a sentinel value, so we rely on the fact that () is (). Let's replace () with numpy._NoValue instead.

@lbferreira lbferreira force-pushed the rio_nodata_compatibility_fix branch from ab46b06 to fa4a84c Compare December 12, 2024 14:29
@lbferreira
Copy link
Contributor Author

@Kirill888 Now everything is working fine without warnings.

@lbferreira
Copy link
Contributor Author

lbferreira commented Dec 13, 2024

@Kirill888 mypy and pylint tests are not passing because we are accessing a protected class/variable. Any thoughts?

@Kirill888 Kirill888 force-pushed the rio_nodata_compatibility_fix branch from e1a33cf to f1a90dd Compare December 15, 2024 23:21
@Kirill888 Kirill888 merged commit 62b7c5e into opendatacube:develop Dec 16, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants