-
Notifications
You must be signed in to change notification settings - Fork 229
Allow passing region to GMTBackendEntrypoint.open_dataset #3932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Support passing in a region as a Sequence [xmin, xmax, ymin, ymax] or ISO country code to `xarray.open_dataset` when using `engine="gmt"`.
Remove duplicated code calling GMT read, since `xr.load_dataarray(engine="gmt")` now works with region argument.
Ensure that passing engine='gmt' to xarray.open_dataarray works for opening GeoTIFF | ||
images. | ||
Ensure that passing engine='gmt' to xarray.open_dataarray works to open a GeoTIFF | ||
image. | ||
""" | ||
with xr.open_dataarray("@earth_day_01d", engine="gmt", raster_kind="image") as da: | ||
assert da.sizes == {"band": 3, "y": 180, "x": 360} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coordinate names are y
/x
when region=None
, but lat
/lon
when region is not None
at L90 below. Need to fix this inconsistency.
pygmt/tests/test_xarray_accessor.py
Outdated
# The source grid file is undefined for tiled grids. | ||
assert grid.encoding.get("source") is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep grid.encoding["source"]
as undefined/None for tiled grids (xref #3673 (comment))? Or select the first tile (e.g. S90E000.earth_relief_05m_p.nc
)? May need to update this test depending on what we decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or select the first tile (e.g.
S90E000.earth_relief_05m_p.nc
)?
Sounds good.
pygmt/xarray/backend.py
Outdated
source: str | list = which(fname=filename_or_obj) | ||
raster.encoding["source"] = ( | ||
source[0] if isinstance(source, list) else source | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need the _ = raster.gmt
line at L124 to load GMTDataArray accessor info, since lib.virtualfile_to_raster
already calls self.read_virtualfile(vfname, kind=kind).contents.to_xarray()
which sets the registration and gtype based on the header?
Lines 197 to 201 in 74de7d8
# Set GMT accessors. | |
# Must put at the end, otherwise info gets lost after certain grid operations. | |
grid.gmt.registration = header.registration | |
grid.gmt.gtype = header.gtype | |
return grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. please try remove it and see if everything works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one extra test failure at https://github.com/GenericMappingTools/pygmt/actions/runs/14743641047/job/41386753150#step:10:1049:
_____________ [doctest] pygmt.xarray.accessor.GMTDataArrayAccessor _____________
026 Examples
027 --------
028 For GMT's built-in remote datasets, these GMT-specific properties are automatically
029 determined and you can access them as follows:
030
031 >>> from pygmt.datasets import load_earth_relief
032 >>> # Use the global Earth relief grid with 1 degree spacing
033 >>> grid = load_earth_relief(resolution="01d", registration="pixel")
034 >>> # See if grid uses Gridline or Pixel registration
035 >>> grid.gmt.registration
Expected:
<GridRegistration.PIXEL: 1>
Got:
1
I think this might be because the _GMT_GRID_HEADER.registration property returns an int
instead of an enum?
pygmt/pygmt/datatypes/header.py
Lines 81 to 82 in d29303b
# Grid registration, 0 for gridline and 1 for pixel | |
("registration", ctp.c_uint32), |
and we overrode grid.gmt.registration
with 1
instead of <GridRegistration.PIXEL: 1>
. Should be a quick fix we can do in a separate PR. Edit: no, the GMTDataArrayAccessor registration property should always return an enum, not an int, something else in this PR seems to be affecting this doctest...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the following changes should fix the issue:
pygmt/pygmt/xarray/accessor.py
Lines 139 to 142 in 33fadc3
with contextlib.suppress(ValueError): | |
self._registration, self._gtype = map( # type: ignore[assignment] | |
int, grdinfo(_source, per_column="n").split()[-2:] | |
) |
if (_source := self._obj.encoding.get("source")) and Path(_source).exists():
with contextlib.suppress(ValueError):
registration, gtype = map( # type: ignore[assignment]
int, grdinfo(_source, per_column="n").split()[-2:]
)
self._registration = GridRegistration(registration)
self._gtype = GridType(gtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 07b2802, and also we can remove the # type: ignore[assignment]
mypy skip 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should declare it an uncaught bug of the previous implementation. In our tests, we have checks like
assert grid.gmt.registration == GridRegistration.GRIDLINE
It's not enough, since the assertion is true when grid.gmt.registration
is GridRegistration.GRIDLINE
or 0. I think we should improve the existing tests and ensure that .registration
and .gtype
are in enums, not int, and this should be done in a separate PR.
The test below shows the current, inconsistent behavior:
>>> from pygmt.datasets import load_earth_relief
>>> grid = load_earth_relief(resolution="01d", registration="pixel")
>>> grid.gmt.registration
<GridRegistration.PIXEL: 1>
>>> type(grid.gmt.registration)
<enum 'GridRegistration'>
>>> grid2 = grid[0:10, 0:10]
>>> grid2.gmt.registration
1
>>> type(grid2.gmt.registration)
int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like enum comparison should be done using is
instead of ==
according to https://docs.python.org/3/howto/enum.html#comparisons (see also https://stackoverflow.com/questions/25858497/should-enum-instances-be-compared-by-identity-or-equality).
assert grid.gmt.registration is enums.GridRegistration.PIXEL # OK
assert 1 is enums.GridRegistration.PIXEL # AssertionError<>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
Wish there was a ruff lint for this (xref astral-sh/ruff#11617), but until then, will need to fix it manually.
Edit: PR at #3942
GMTDataArrayAccessor info should already be loaded by calling`virtualfile_to_raster` which calls `self.read_virtualfile(vfname, kind=kind).contents.to_xarray()` that sets registration and gtype from the header.
url = "https://pygmt.org/dev/api/generated/pygmt.GMTBackendEntrypoint.html" | ||
|
||
@kwargs_to_strings(region="sequence") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been think if we should avoid using the @kwargs_to_strings
decorator in new functions/methods, and instead write a new function like seqjoin
which does exactly the same thing.
raster.encoding["source"] = ( | ||
source[0] if isinstance(source, list) else source | ||
) | ||
_ = raster.gmt # Load GMTDataArray accessor information | ||
return raster.to_dataset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's likely that the accessor information will be lost when converting via to_dataset
.
@@ -581,22 +579,9 @@ def _load_remote_dataset( | |||
raise GMTInvalidInput(msg) | |||
|
|||
fname = f"@{prefix}_{resolution}_{reg}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a lot of error messages like:
Error: h [ERROR]: Tile @S90W180.earth_age_01m_g.nc not found!
Error: h [ERROR]: Tile @S90W150.earth_age_01m_g.nc not found!
Error: h [ERROR]: Tile @S90W120.earth_age_01m_g.nc not found!
Error: h [ERROR]: Tile @S90W090.earth_age_01m_g.nc not found!
Error: h [ERROR]: Tile @S90W060.earth_age_01m_g.nc not found!
Error: h [ERROR]: Tile @S90W030.earth_age_01m_g.nc not found!
Error: h [ERROR]: Tile @S90E000.earth_age_01m_g.nc not found!
Error: h [ERROR]: Tile @S90E030.earth_age_01m_g.nc not found!
This is because, in the GMT backend, we use something like which("@earth_age_01m_g")
to get the file path, which doesn't work well for tiled grids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we used to do this:
# Full path to the grid if not tiled grids.
source = which(fname, download="a") if not resinfo.tiled else None
# Manually add source to xarray.DataArray encoding to make the GMT accessors work.
if source:
grid.encoding["source"] = source
i.e. only add the source for non-tiled grids, so that the accessor's which
call doesn't report this error. I'm thinking if it's possible to either 1) silence the which
call (does verbose="q"
work?), or 2) add some heuristic/logic to determine whether the source is a tiled grid before calling which
in GMTBackendEntrypoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if it's possible to either 1) silence the
which
call (doesverbose="q"
work?), or 2) add some heuristic/logic to determine whether the source is a tiled grid before callingwhich
inGMTBackendEntrypoint
I think either works. Perhaps verbose="q"
is easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 5557b33.
Edit: Also just realized that verbose="q"
was suggested before in #524 (comment).
Slicing a tiled grid retains the original source still, but doing math operations like addition cause source to be lost and fallback to default registration/gtype.
pygmt/tests/test_xarray_accessor.py
Outdated
# For a sliced grid, ensure we don't fallback to the default registration (gridline) | ||
# and gtype (cartesian), because the source grid file should still exist. | ||
sliced_grid = grid[1:3, 1:3] | ||
assert sliced_grid.gmt.registration is GridRegistration.GRIDLINE | ||
assert sliced_grid.gmt.gtype is GridType.CARTESIAN | ||
|
||
# Still possible to manually set registration and gtype. | ||
sliced_grid.gmt.registration = GridRegistration.PIXEL | ||
sliced_grid.gmt.gtype = GridType.GEOGRAPHIC | ||
assert sliced_grid.encoding["source"].endswith("S90E000.earth_relief_05m_p.nc") | ||
assert sliced_grid.gmt.registration is GridRegistration.PIXEL | ||
assert sliced_grid.gmt.gtype is GridType.GEOGRAPHIC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to triple check this part. Do we preserve the source encoding now even after slicing?!! I.e. is #524 fixed? Or is there just some caching going on.
However, doing math operations (e.g. addition (+)) still removes the source.
Try a different tile to see if it passes on CI
from pygmt.exceptions import GMTInvalidInput | ||
from pygmt.helpers import build_arg_list, kwargs_to_strings | ||
from pygmt.src import which | ||
from pygmt.helpers import kwargs_to_strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 503, @kwargs_to_strings(region="sequence")
can be removed, since the region
parameter, either a sequence or a string, can be directly passed to xr.load_dataarray
.
Description of proposed changes
Support passing in a region as a Sequence [xmin, xmax, ymin, ymax] or ISO country code to
xarray.open_dataset
when usingengine="gmt"
.Usage:
This PR also refactors the internals of
_load_remote_dataset
to usexr.load_dataarray(engine="gmt", ...)
instead of low-level calls to clib functions.Extends #3919, adapted from #3673
Preview:
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash command is:
/format
: automatically format and lint the code