Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xarray Dataset Support #1490
base: main
Are you sure you want to change the base?
Xarray Dataset Support #1490
Changes from 7 commits
bf1cd30
9cb8efe
064cad6
28766a4
5ffd160
1459b12
8f93e07
dc4cc8c
c58972f
6d0dc39
0b7e0b5
27999f3
ff49477
1e774fb
a06ab61
702dfef
67954d7
5c99a2f
48cdbc8
f025a84
f723198
5d65a7f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to determine the minimum version that works before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never thought about the possibility of the filename containing the bounds/res/crs. Have you seen this in the wild? If so, we could add a check for this in the regex and extract it from the filename like we do for time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this was just a dummy naming scheme for test data. Maybe we should explicitly create test data for some of the different data cases like CMIP, ERA5, MODIS and others and write test cases for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that once we have subclasses for each of those datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend renaming this file. Otherwise the following become very different things:
Maybe call it
rioxr.py
? Or just throw it ingeo.py
with the other base classes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used at the moment and could probably be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running into this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather ignore that warning in pyproject.toml than add a fake import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a standard attribute name or just something chosen by the authors of your particular NetCDF files? If necessary we can always add a class attribute that lists the names of layers to look for things like time from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have encountered different naming schemes for spatial coordinate
x and y
orlat and lon
but for the time dimension it was always calledtime
. However, theds.rio.clip_box()
function in the__getitem__
method expects the spatial coordinates to be named x and y otherwise it complains and tells you to rename them to x and y.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will they always be float32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there could be cases where you also have integers but I would expect most datasets to have float values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to keep it dynamic if we can't predict 100% of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does rioxarray automatically reproject to the right CRS if files are in multiple or if the user chooses a different CRS than the default (or if IntersectionDataset changes it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Point, I think it does not automatically reproject. So far I have only used climate data which doesn't explicitly encode or use a CRS, I should check with some MODIS files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I am just doing this with the climate data, where there is no explicit CRS and
ds.rio.reproject()
also only works for 2D and 3D arrays, whereas the CMIP data I have has more dimensions so I getrioxarray.exceptions.TooManyDimensions: Only 2D and 3D data arrays supported. Data variable: tos
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, trying it with MODIS files which are
.hdf
files, I can only open them withrioxarray.open_rasterio()
and notxr.open_dataset(engine="rasterio")
so maybe one base class is too ambitious and ugly to support climate and satellite data at once.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support both images and masks. See the
is_image
attribute inRasterDataset
for how we do this there. You can also copy thedtype
property to automatically choose what dtype to cast to.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this works correctly for multiple overlapping files. We shouldn't be stacking, we should be merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, maybe have to rethink the base class thing and have one for xarray climate type data that is not using crs explicitly (
class XarrayDataset(GeoDataset)
) and one that is intended for crs depending data sources like MODIS and more similar toRasterDataset
and usingrioxarray
so the current namingclass RioXarrayDataset(GeoDataset)
but with the required functionality to handle overlapping files etc.