Allow passing region to GMTBackendEntrypoint.open_dataset #3932

weiji14 · 2025-04-29T09:32:59Z

Description of proposed changes

Support passing in a region as a Sequence [xmin, xmax, ymin, ymax] or ISO country code to xarray.open_dataset when using engine="gmt".

Usage:

import numpy.testing as npt
import xarray as xr

da = xr.open_dataarray("@static_earth_relief.nc", engine="gmt", raster_kind="grid", region=[-52, -48, -18, -12])
assert da.sizes == {"lat": 6, "lon": 4}
npt.assert_allclose(da.lat, [-17.5, -16.5, -15.5, -14.5, -13.5, -12.5])
npt.assert_allclose(da.lon, [-51.5, -50.5, -49.5, -48.5])

This PR also refactors the internals of _load_remote_dataset to use xr.load_dataarray(engine="gmt", ...) instead of low-level calls to clib functions.

Extends #3919, adapted from #3673

Preview:

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

/format: automatically format and lint the code

Support passing in a region as a Sequence [xmin, xmax, ymin, ymax] or ISO country code to `xarray.open_dataset` when using `engine="gmt"`.

Remove duplicated code calling GMT read, since `xr.load_dataarray(engine="gmt")` now works with region argument.

weiji14 · 2025-04-29T09:34:23Z

pygmt/tests/test_xarray_backend.py

-    Ensure that passing engine='gmt' to xarray.open_dataarray works for opening GeoTIFF
-    images.
+    Ensure that passing engine='gmt' to xarray.open_dataarray works to open a GeoTIFF
+    image.
    """
    with xr.open_dataarray("@earth_day_01d", engine="gmt", raster_kind="image") as da:
        assert da.sizes == {"band": 3, "y": 180, "x": 360}


Coordinate names are y/x when region=None, but lat/lon when region is not None at L90 below. Need to fix this inconsistency.

weiji14 · 2025-04-29T10:15:37Z

pygmt/tests/test_xarray_accessor.py

+    # The source grid file is undefined for tiled grids.
    assert grid.encoding.get("source") is None


Should we keep grid.encoding["source"] as undefined/None for tiled grids (xref #3673 (comment))? Or select the first tile (e.g. S90E000.earth_relief_05m_p.nc)? May need to update this test depending on what we decide.

Or select the first tile (e.g. S90E000.earth_relief_05m_p.nc)?

Sounds good.

weiji14 · 2025-04-29T23:33:04Z

pygmt/xarray/backend.py

+                source: str | list = which(fname=filename_or_obj)
                raster.encoding["source"] = (
                    source[0] if isinstance(source, list) else source
                )


Do we actually need the _ = raster.gmt line at L124 to load GMTDataArray accessor info, since lib.virtualfile_to_raster already calls self.read_virtualfile(vfname, kind=kind).contents.to_xarray() which sets the registration and gtype based on the header?

pygmt/pygmt/datatypes/grid.py

Lines 197 to 201 in 74de7d8

# Set GMT accessors.

# Must put at the end, otherwise info gets lost after certain grid operations.

grid.gmt.registration = header.registration

grid.gmt.gtype = header.gtype

return grid

that's a good point. please try remove it and see if everything works fine.

There is one extra test failure at https://github.com/GenericMappingTools/pygmt/actions/runs/14743641047/job/41386753150#step:10:1049:

_____________ [doctest] pygmt.xarray.accessor.GMTDataArrayAccessor _____________ 026 Examples 027 -------- 028 For GMT's built-in remote datasets, these GMT-specific properties are automatically 029 determined and you can access them as follows: 030 031 >>> from pygmt.datasets import load_earth_relief 032 >>> # Use the global Earth relief grid with 1 degree spacing 033 >>> grid = load_earth_relief(resolution="01d", registration="pixel") 034 >>> # See if grid uses Gridline or Pixel registration 035 >>> grid.gmt.registration Expected: <GridRegistration.PIXEL: 1> Got: 1

I think this might be because the _GMT_GRID_HEADER.registration property returns an int instead of an enum?

pygmt/pygmt/datatypes/header.py

Lines 81 to 82 in d29303b

# Grid registration, 0 for gridline and 1 for pixel

("registration", ctp.c_uint32),

and we overrode grid.gmt.registration with 1 instead of <GridRegistration.PIXEL: 1>. ~~Should be a quick fix we can do in a separate PR.~~ Edit: no, the GMTDataArrayAccessor registration property should always return an enum, not an int, something else in this PR seems to be affecting this doctest...

Making the following changes should fix the issue:

pygmt/pygmt/xarray/accessor.py

Lines 139 to 142 in 33fadc3

with contextlib.suppress(ValueError):

self._registration, self._gtype = map( # type: ignore[assignment]

int, grdinfo(_source, per_column="n").split()[-2:]

)

if (_source := self._obj.encoding.get("source")) and Path(_source).exists(): with contextlib.suppress(ValueError): registration, gtype = map( # type: ignore[assignment] int, grdinfo(_source, per_column="n").split()[-2:] ) self._registration = GridRegistration(registration) self._gtype = GridType(gtype)

Done in commit 07b2802, and also we can remove the # type: ignore[assignment] mypy skip 🎉

I think we should declare it an uncaught bug of the previous implementation. In our tests, we have checks like

assert grid.gmt.registration == GridRegistration.GRIDLINE

It's not enough, since the assertion is true when grid.gmt.registration is GridRegistration.GRIDLINE or 0. I think we should improve the existing tests and ensure that .registration and .gtype are in enums, not int, and this should be done in a separate PR.

The test below shows the current, inconsistent behavior:

>>> from pygmt.datasets import load_earth_relief >>> grid = load_earth_relief(resolution="01d", registration="pixel") >>> grid.gmt.registration <GridRegistration.PIXEL: 1> >>> type(grid.gmt.registration) <enum 'GridRegistration'> >>> grid2 = grid[0:10, 0:10] >>> grid2.gmt.registration 1 >>> type(grid2.gmt.registration) int

It seems like enum comparison should be done using is instead of == according to https://docs.python.org/3/howto/enum.html#comparisons (see also https://stackoverflow.com/questions/25858497/should-enum-instances-be-compared-by-identity-or-equality).

assert grid.gmt.registration is enums.GridRegistration.PIXEL # OK assert 1 is enums.GridRegistration.PIXEL # AssertionError<>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?

Wish there was a ruff lint for this (xref astral-sh/ruff#11617), but until then, will need to fix it manually.

Edit: PR at #3942

GMTDataArrayAccessor info should already be loaded by calling`virtualfile_to_raster` which calls `self.read_virtualfile(vfname, kind=kind).contents.to_xarray()` that sets registration and gtype from the header.

seisman · 2025-05-05T12:06:53Z

pygmt/xarray/backend.py

    url = "https://pygmt.org/dev/api/generated/pygmt.GMTBackendEntrypoint.html"

+    @kwargs_to_strings(region="sequence")


I've been think if we should avoid using the @kwargs_to_strings decorator in new functions/methods, and instead write a new function like seqjoin which does exactly the same thing.

Probably best to open a separate issue/PR for this.

seisman · 2025-05-05T15:22:03Z

pygmt/xarray/backend.py

                raster.encoding["source"] = (
                    source[0] if isinstance(source, list) else source
                )
-                _ = raster.gmt  # Load GMTDataArray accessor information
                return raster.to_dataset()


Actually, it's likely that the accessor information will be lost when converting via to_dataset.

seisman · 2025-05-06T23:47:58Z

pygmt/datasets/load_remote_dataset.py

@@ -581,22 +579,9 @@ def _load_remote_dataset(
        raise GMTInvalidInput(msg)

    fname = f"@{prefix}_{resolution}_{reg}"


I see a lot of error messages like:

Error: h [ERROR]: Tile @S90W180.earth_age_01m_g.nc not found! Error: h [ERROR]: Tile @S90W150.earth_age_01m_g.nc not found! Error: h [ERROR]: Tile @S90W120.earth_age_01m_g.nc not found! Error: h [ERROR]: Tile @S90W090.earth_age_01m_g.nc not found! Error: h [ERROR]: Tile @S90W060.earth_age_01m_g.nc not found! Error: h [ERROR]: Tile @S90W030.earth_age_01m_g.nc not found! Error: h [ERROR]: Tile @S90E000.earth_age_01m_g.nc not found! Error: h [ERROR]: Tile @S90E030.earth_age_01m_g.nc not found!

This is because, in the GMT backend, we use something like which("@earth_age_01m_g") to get the file path, which doesn't work well for tiled grids.

Yeah, we used to do this:

# Full path to the grid if not tiled grids. source = which(fname, download="a") if not resinfo.tiled else None # Manually add source to xarray.DataArray encoding to make the GMT accessors work. if source: grid.encoding["source"] = source

i.e. only add the source for non-tiled grids, so that the accessor's which call doesn't report this error. I'm thinking if it's possible to either 1) silence the which call (does verbose="q" work?), or 2) add some heuristic/logic to determine whether the source is a tiled grid before calling which in GMTBackendEntrypoint

I'm thinking if it's possible to either 1) silence the which call (does verbose="q" work?), or 2) add some heuristic/logic to determine whether the source is a tiled grid before calling which in GMTBackendEntrypoint

I think either works. Perhaps verbose="q" is easier?

Done in commit 5557b33.

Edit: Also just realized that verbose="q" was suggested before in #524 (comment).

Slicing a tiled grid retains the original source still, but doing math operations like addition cause source to be lost and fallback to default registration/gtype.

weiji14 · 2025-05-09T09:11:39Z

pygmt/tests/test_xarray_accessor.py

+    # For a sliced grid, ensure we don't fallback to the default registration (gridline)
+    # and gtype (cartesian), because the source grid file should still exist.
    sliced_grid = grid[1:3, 1:3]
-    assert sliced_grid.gmt.registration is GridRegistration.GRIDLINE
-    assert sliced_grid.gmt.gtype is GridType.CARTESIAN
-
-    # Still possible to manually set registration and gtype.
-    sliced_grid.gmt.registration = GridRegistration.PIXEL
-    sliced_grid.gmt.gtype = GridType.GEOGRAPHIC
+    assert sliced_grid.encoding["source"].endswith("S90E000.earth_relief_05m_p.nc")
    assert sliced_grid.gmt.registration is GridRegistration.PIXEL
    assert sliced_grid.gmt.gtype is GridType.GEOGRAPHIC


Need to triple check this part. Do we preserve the source encoding now even after slicing?!! I.e. is #524 fixed? Or is there just some caching going on.

However, doing math operations (e.g. addition (+)) still removes the source.

Do we preserve the source encoding now even after slicing?!!

I've tried both xarray v2025.03 and v0.15 (in 2020). It seems the source encoding is always kept.

In [1]: import xarray as xr In [2]: grid = xr.load_dataarray("earth_relief_01d_g.grd") In [3]: grid.encoding Out[3]: {'zlib': True, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': True, 'complevel': 9, 'fletcher32': False, 'contiguous': False, 'chunksizes': (181, 181), 'source': '/home/seisman/.gmt/server/earth/earth_relief/earth_relief_01d_g.grd', 'original_shape': (181, 361), 'dtype': dtype('int16'), '_FillValue': -32768, 'scale_factor': 0.5} In [4]: grid2 = grid[0:10, 0:10] In [5]: grid2.encoding Out[5]: {'zlib': True, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': True, 'complevel': 9, 'fletcher32': False, 'contiguous': False, 'chunksizes': (181, 181), 'source': '/home/seisman/.gmt/server/earth/earth_relief/earth_relief_01d_g.grd', 'original_shape': (181, 361), 'dtype': dtype('int16'), '_FillValue': -32768, 'scale_factor': 0.5}

I.e. is #524 fixed?

I think #524 is a different issue. The main point in #524 is that, for slice operations, the accessor information is lost, so we need to call grdinfo on the source grid to determine gtype and registration. In previous versions (before GMT_GRID was implemented), we use temporary files a lot, so the source file may not exists. I guess we need to revisit #524 and see if it can be closed.

Try a different tile to see if it passes on CI

pygmt/datasets/load_remote_dataset.py

pygmt/xarray/backend.py

seisman · 2025-05-10T05:15:16Z

pygmt/tests/test_xarray_accessor.py

+    # For a sliced grid, ensure we don't fallback to the default registration (gridline)
+    # and gtype (cartesian), because the source grid file should still exist.
    sliced_grid = grid[1:3, 1:3]
-    assert sliced_grid.gmt.registration is GridRegistration.GRIDLINE
-    assert sliced_grid.gmt.gtype is GridType.CARTESIAN
-
-    # Still possible to manually set registration and gtype.
-    sliced_grid.gmt.registration = GridRegistration.PIXEL
-    sliced_grid.gmt.gtype = GridType.GEOGRAPHIC
+    assert sliced_grid.encoding["source"].endswith("S90E000.earth_relief_05m_p.nc")
    assert sliced_grid.gmt.registration is GridRegistration.PIXEL
    assert sliced_grid.gmt.gtype is GridType.GEOGRAPHIC


Do we preserve the source encoding now even after slicing?!!

I've tried both xarray v2025.03 and v0.15 (in 2020). It seems the source encoding is always kept.

In [1]: import xarray as xr In [2]: grid = xr.load_dataarray("earth_relief_01d_g.grd") In [3]: grid.encoding Out[3]: {'zlib': True, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': True, 'complevel': 9, 'fletcher32': False, 'contiguous': False, 'chunksizes': (181, 181), 'source': '/home/seisman/.gmt/server/earth/earth_relief/earth_relief_01d_g.grd', 'original_shape': (181, 361), 'dtype': dtype('int16'), '_FillValue': -32768, 'scale_factor': 0.5} In [4]: grid2 = grid[0:10, 0:10] In [5]: grid2.encoding Out[5]: {'zlib': True, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': True, 'complevel': 9, 'fletcher32': False, 'contiguous': False, 'chunksizes': (181, 181), 'source': '/home/seisman/.gmt/server/earth/earth_relief/earth_relief_01d_g.grd', 'original_shape': (181, 361), 'dtype': dtype('int16'), '_FillValue': -32768, 'scale_factor': 0.5}

I.e. is #524 fixed?

I think #524 is a different issue. The main point in #524 is that, for slice operations, the accessor information is lost, so we need to call grdinfo on the source grid to determine gtype and registration. In previous versions (before GMT_GRID was implemented), we use temporary files a lot, so the source file may not exists. I guess we need to revisit #524 and see if it can be closed.

seisman · 2025-05-10T11:40:32Z

Should we keep grid.encoding["source"] as undefined/None for tiled grids (xref #3673 (comment))? Or select the first tile (e.g. S90E000.earth_relief_05m_p.nc)? May need to update this test depending on what we decide.

@weiji14 Perhaps we should cherry-pick commits related to these changes into a separate PR, with a title like "Store the first tile as source encoding for tiled grids" and linking to issue #524.

So that GMT accessor info works with tiled grids too. Adapted from #3932.

Co-authored-by: Dongdong Tian <[email protected]>

pygmt/xarray/backend.py

Co-authored-by: Dongdong Tian <[email protected]>

seisman

Looks good to me.

Allow passing region to GMTBackendEntrypoint.open_dataset

12c2662

Support passing in a region as a Sequence [xmin, xmax, ymin, ymax] or ISO country code to `xarray.open_dataset` when using `engine="gmt"`.

weiji14 added the enhancement Improving an existing feature label Apr 29, 2025

weiji14 added this to the 0.16.0 milestone Apr 29, 2025

weiji14 self-assigned this Apr 29, 2025

Refactor _load_remote_dataset internals to use xr.load_dataarray

3114987

Remove duplicated code calling GMT read, since `xr.load_dataarray(engine="gmt")` now works with region argument.

weiji14 commented Apr 29, 2025

View reviewed changes

weiji14 added 3 commits April 29, 2025 21:55

Update TypeError regex for test_xarray_backend_gmt_read_invalid_kind

72abcaf

Merge branch 'main' into gmtbackendentrypoint/region

3a04239

Source file for tiled grids is undefined previously, but not anymore?

1a5837a

weiji14 commented Apr 29, 2025

View reviewed changes

Set type-hint for source variable

fe6bd44

weiji14 commented Apr 29, 2025

View reviewed changes

Don't need to re-load GMTDataArray accessor info in GMTBackendEntrypoint

6dec9ad

GMTDataArrayAccessor info should already be loaded by calling`virtualfile_to_raster` which calls `self.read_virtualfile(vfname, kind=kind).contents.to_xarray()` that sets registration and gtype from the header.

seisman reviewed May 5, 2025

View reviewed changes

weiji14 added 2 commits May 7, 2025 11:03

Merge branch 'main' into gmtbackendentrypoint/region

c2c010c

Set registration and gtype properly as enums on init

07b2802

seisman reviewed May 6, 2025

View reviewed changes

Silence Error: h [ERROR]: Tile @*.earth_*.nc not found errors

5557b33

weiji14 mentioned this pull request May 9, 2025

Set registration and gtype properly as enums on gmt accessor init #3942

Merged

6 tasks

seisman closed this in #3942 May 9, 2025

weiji14 reopened this May 9, 2025

weiji14 added 3 commits May 9, 2025 15:32

Merge branch 'main' into gmtbackendentrypoint/region

0006f2e

Add doctest for load_dataarray with region argument

deb2df0

Refactor test_xarray_accessor_grid_source_file_not_exist

221b60c

Slicing a tiled grid retains the original source still, but doing math operations like addition cause source to be lost and fallback to default registration/gtype.

weiji14 commented May 9, 2025

View reviewed changes

Difference in returned tile order on CI vs local?

11436ad

Try a different tile to see if it passes on CI

seisman reviewed May 9, 2025

View reviewed changes

pygmt/datasets/load_remote_dataset.py Outdated Show resolved Hide resolved

seisman reviewed May 10, 2025

View reviewed changes

seisman mentioned this pull request May 10, 2025

GMTDataArrayAccessor doesn't work for temporary files that have been sliced #524

Closed

seisman added the needs review This PR has higher priority and needs review. label May 12, 2025

weiji14 added a commit that referenced this pull request May 14, 2025

Store first tile as source encoding for tiled grids

58dc53f

So that GMT accessor info works with tiled grids too. Adapted from #3932.

weiji14 mentioned this pull request May 14, 2025

Store first tile as source encoding for tiled grids #3950

Merged

6 tasks

Merge branch 'main' into gmtbackendentrypoint/region

f8af963

weiji14 force-pushed the gmtbackendentrypoint/region branch from 573909c to f8af963 Compare May 14, 2025 20:47

weiji14 and others added 4 commits May 15, 2025 08:49

Reduce diff from merge conflict

ba73095

Remove @kwargs_to_strings from _load_remote_dataset

36fb6e0

Co-authored-by: Dongdong Tian <[email protected]>

Docstring updates

ae785c7

Co-authored-by: Dongdong Tian <[email protected]>

format

c176154

seisman reviewed May 15, 2025

View reviewed changes

pygmt/xarray/backend.py Outdated Show resolved Hide resolved

Sort list of source files alphabetically

b488b10

Co-authored-by: Dongdong Tian <[email protected]>

seisman approved these changes May 15, 2025

View reviewed changes

weiji14 added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels May 15, 2025

weiji14 marked this pull request as ready for review May 15, 2025 01:24

weiji14 removed the final review call This PR requires final review and approval from a second reviewer label May 16, 2025

weiji14 merged commit a917dad into main May 16, 2025
29 of 30 checks passed

weiji14 deleted the gmtbackendentrypoint/region branch May 16, 2025 02:13

seisman mentioned this pull request Jun 14, 2025

DOC: Add gallery example for using EPSG codes #3973

Draft

6 tasks

		# The source grid file is undefined for tiled grids.
		assert grid.encoding.get("source") is None

	# Set GMT accessors.
	# Must put at the end, otherwise info gets lost after certain grid operations.
	grid.gmt.registration = header.registration
	grid.gmt.gtype = header.gtype
	return grid

	# Grid registration, 0 for gridline and 1 for pixel
	("registration", ctp.c_uint32),

	with contextlib.suppress(ValueError):
	self._registration, self._gtype = map( # type: ignore[assignment]
	int, grdinfo(_source, per_column="n").split()[-2:]
	)

		url = "https://pygmt.org/dev/api/generated/pygmt.GMTBackendEntrypoint.html"

		@kwargs_to_strings(region="sequence")

		@@ -581,22 +579,9 @@ def _load_remote_dataset(
		raise GMTInvalidInput(msg)

		fname = f"@{prefix}_{resolution}_{reg}"

Allow passing region to GMTBackendEntrypoint.open_dataset #3932

Allow passing region to GMTBackendEntrypoint.open_dataset #3932

Uh oh!

Conversation

weiji14 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiji14 Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seisman May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiji14 May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiji14 May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiji14 May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seisman commented May 10, 2025

Uh oh!

Uh oh!

seisman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

weiji14 commented Apr 29, 2025 •

edited

Loading

weiji14 Apr 30, 2025 •

edited

Loading

seisman May 6, 2025 •

edited

Loading

weiji14 May 8, 2025 •

edited

Loading

weiji14 May 8, 2025 •

edited

Loading

weiji14 May 9, 2025 •

edited

Loading