-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single vs. multi CRS datasets #2
Comments
I think a single CRS is the simpler conceptual data model and worth enforcing. While it may be possible to represent coordinates in multiple CRSs in a single object, I've yet to see a compelling use-case compared to the alternative of reprojecting coordinates to a new CRS during/after loading. For vectors, the number of geometries doesn't change between CRS (I think), but even so, libraries like geopandas operate under the assumption of a single 'active CRS' at a time For rectilinear rasters, the number of coordinates will likely change between CRSs (under the assumption of maintaining the same dx,dy resolution), so I don't see that fitting well with the Xarray data model.
I've never actually come across a netCDF file that does this! Would be interested to know if others have? |
Hmm I thought the example 5.13 in the CF-conventions (version 1.11) would look like below when loaded into a xarray Dataset? >>> ds
<xarray.Dataset>
Dimensions: (x: 18, y: 36)
Coordinates:
* x (x) float64 ...
* y (y) float64 ...
lat (y, x) float64 ...
lon (y, x) float64 ...
crs_wgs84 int64 0
* crs_osgb int64 0
Data variables:
temp (y, x) float64 ...
Indexes:
x PandasIndex
y PandasIndex
crs_osgb CRSIndex (crs=EPSG:27700) This assumes a single-CRS model (using the projected CRS as the "active" one). Now assuming a multi-CRS model and a geographic Xarray index set for the lat/lon coordinates, it may look like: >>> ds
<xarray.Dataset>
Dimensions: (x: 18, y: 36)
Coordinates:
* x (x) float64 ...
* y (y) float64 ...
* lat (y, x) float64 ...
* lon (y, x) float64 ...
* crs_wgs84 int64 0
* crs_osgb int64 0
Data variables:
temp (y, x) float64 ...
Indexes:
x PandasIndex
y PandasIndex
┌ lat GeographyIndex
└ lon
crs_wgs84 CRSIndex (crs=EPSG:4326)
crs_osgb CRSIndex (crs=EPSG:27700) (note: I plan to refactor I find the latter example pretty illustrative and useful, actually (i.e., it allows selecting data either based on lat/lon or x/y coordinates). On a related note, I'm wondering if the reason why I could imagine similar examples of vector data cubes where we can select data based on either planar or spherical geometries. I agree that single-CRS is a simpler conceptual data model, and perhaps we could imagine the concept of "active" CRS for the example above? That said, as far as I understand the concept of "active" geometry column (and its CRS) in GeoPandas is rather specific to the dataframe model (often based on one index), which is different from the Xarray data model. The code below (API and behavior) looks pretty clear to me but sadly wouldn't be possible to write if we enforce single-CRS. >>> ds.proj("crs_wgs84").crs
<Geographic 2D CRS: EPSG:4326>
...
>>> ds.proj("crs_osgb").crs
<Projected CRS: EPSG:27700>
...
>>> ds.proj.crs
ValueError: multiple CRSs found
>>> ds_latlon = ds.drop_vars(["x", "y", "crs_osgb"])
>>> ds_latlon.proj.crs
<Geographic 2D CRS: EPSG:4326>
...
>>> # ... continue using `ds_latlon` in CRS-aware operations without any extra-step required I'm sure a multi-CRS model would introduce some extra complexity for other things (re-projection API?), but I wonder if we can keep it under control. |
Interesting! Thanks for clarifying with the example - I was coming at this from the more narrow view of rasters being represented by an affine and 1D coordinates. I also didn't look closely the CF example and thought it was for storing 2 different data arrays in a single file (e.g. as different HDF groups of different sizes 18x36 and 648x648) 🤦...
This does seem neat, and I'm not opposed to leaving the door open and running with multi-CRS! Of course, if the data is not stored w/ multiCRS to begin with, there is the alternative approach of re-projecting the values/geometry used for selection. It would be good to identify some existing datasets that are stored like the example above - maybe climate model output or something in the xvec or xdggs realms? |
Yes I agree the CF examples are a little bit confusing. It is mentioned that example 5.13 results from examples 5.11 and 5.12 combined together but in example 5.12 we have this:
I've relied to the dimensions of coordinates |
Interesting note I've found in geoxarray/geoxarray#21 (comment):
So in short GDAL's NetCDF driver doesn't seem to support multiple CRSs. I didn't find any reference about that in GDAL's documentation and repository, though. |
Brief summary on the current state of things:
I'm still unsure about which model In either case,
|
I would not say it is too common but there is certainly a pattern where you keep multiple geometry column in a GeoDataFrame, each representing the same but in a different CRS. I've been doing that to do analysis on projected CRS but for visualisation with lonboard or folium you need 4326 or 3857 and don't want to repeatedly reproject. Not sure how common this would be in the vector data cube world but I can imagine a similar pattern there. |
This has been implemented in #10. |
I relaxed single-CRS enforcement in #18. We can still change our minds later. |
This is a big question that may generate lots of discussion: should we allow here only one CRS defined per
xarray.Dataset
/xarray.DataArray
or should we support multiple CRS?Why supporting multiple CRSs?
Xproj relies on scalar Xarray coordinates with a
CRSIndex
. This is inspired by the CF-conventions, and such kind of coordinate is used already in tools like rioxarray and odc-geo. AFAIK there doesn't seem to be any restriction in the CF-conventions about the number of grid mapping coordinate variables? (see CF-conventions 1.10, section 5.6, although I might have overlooked it? (EDIT: example 5.13 has both lat-lon geographic and x-y projected coordinate systems in the same dataset). There's no real technical barrier either in supporting multi-CRS with the current Xarray data model (coordinates + custom indexes).Other Xarray extensions like xvec technically support multiple CRS (
xvec
currently encapsulates the CRS into the geometry coordinate / index). Although I’m not sure if multi-CRS vector data cubes exist and/or make sense, in theory there will be some friction in adoptingxproj
if the latter only works with single-CRS (breaking changes).Single-CRS is easy to enforce in 3rd-party extensions. #1 provides a convenient API to work with single-CRS datasets or dataarrays, while still supporting the multi-CRS case.
Why this may be a bad idea?
Supporting multi-CRS datasets possibly opens a can of worms?
This is potentially confusing. I cannot imagine a DataArray representing a single raster (or a mosaic / stack of rasters) have multiple CRSs defined. I haven't checked but I guess that
rioxarray
andodc-geo
won't work with multi-CRS (EDIT: I checked that and both libraries do not support multiple grid mapping attribute values). Although here again, single-CRS may be enforced in those libraries and no big deal ifxproj
provides a user-friendly, single-CRS API (#1)?Any thoughts on this?
The text was updated successfully, but these errors were encountered: