Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: pygeos & rtree engine for geometry based selection #34

Open
snowman2 opened this issue Jan 6, 2021 · 5 comments
Open

PROPOSAL: pygeos & rtree engine for geometry based selection #34

snowman2 opened this issue Jan 6, 2021 · 5 comments

Comments

@snowman2
Copy link
Member

snowman2 commented Jan 6, 2021

I am wondering if adding support for these R-Tree based selectors would be of interest:

It would allow for selecting the points with a geometry as input.

@snowman2
Copy link
Member Author

snowman2 commented Jan 6, 2021

Closing for #12

@snowman2 snowman2 closed this as completed Jan 6, 2021
@benbovy
Copy link
Member

benbovy commented Jan 8, 2021

I am wondering if adding support for these R-Tree based selectors would be of interest

Yes this would be nice additions! We could add adapters for those R-Tree indexes either in xoak or in 3rd party packages. Adding it in xoak would be a reasonable option for now (dependencies are optional for all adapters implemented in xoak). I'm not sure how the number of adapters would grow in the future, though.

It would allow for selecting the points with a geometry as input.

This would be great too. That would require some work, though, as we need to extend the API of the index adapters and the xoak xarray accessors in order to support this. Open questions, I guess, are which "conventions" to use for representing those geometries within the xarray data model, and what are the advantages of using xarray instead of, e.g., geopandas for this use case.

(I'm re-opening this issue as I think it's worth discussing geometry based selection here - the topic #12 is too broad)

@benbovy benbovy reopened this Jan 8, 2021
@snowman2
Copy link
Member Author

snowman2 commented Jan 8, 2021

I guess, are which "conventions" to use for representing those geometries within the xarray data model, and what are the advantages of using xarray instead of, e.g., geopandas for this use case.

This stems from: corteva/rioxarray#202

Currently in rioxarray you can clip the data using a list of geometries:
https://corteva.github.io/rioxarray/stable/examples/clip_geom.html

However, this only works for regular grids (i.e. 1 dimensional x and y coordinates that are equally spaced)

Coordinates:
  * y            (y) float64 ...
  * x            (x) float64 ...

The idea of this would be to enable clipping with 2 dimensional coordinates (usually latitude and longitude) with irregular grids:

  * latitude            (latitude, longitude) float64 ...
  * longitude           (latitude, longitude) float64 ... 

With the pygeos.STRTree, you can query for the coordinates that intersect the input geometries and get the indexes back to generate a mask you could use to clip/select data.

>>> import pygeos
>>> tree = pygeos.STRtree(pygeos.creation,points(xds.longitude, xds.latitude))
>>> tree.query_bulk([pygeos.box(2, 2, 4, 4), pygeos.box(5, 5, 6, 6)]).tolist()

In this scenario, there isn't really a need/benefit to use geopandas.

@benbovy
Copy link
Member

benbovy commented Jan 11, 2021

I see, clipping irregular data with a list of geometries is a nice example that xoak should support directly or indirectly.

xoak currently exposes a .sel method that is similar to xarray's .sel. Currently only point-wise indexing (i.e., using xarray objects as indexers) is possible, but range or rectangular selection (using slices) like in the simple examples given in https://corteva.github.io/rioxarray/stable/examples/clip_geom.html should be supported later. xoak's xarray accessor API could be extended with extra methods for more advanced but still common operations like clip with compound or complex geometries (although it might be too specific to fall in xoak's scope).

In addition, there's ongoing discussion/work in #32 to expose xarray-friendly wrappers for the query methods of the adapted index objects. The challenge here would be to support the variety of the queries supported by each index (e.g., pygeos.STRTree.query_bulk, sklearn.neighbors.KDTree.query_radius, etc.) through a common API.

BTW, I started writing bindings for the s2geometry library here that may provide similar capabilities than pygeos.STRTree for spherical coordinates.

@benbovy
Copy link
Member

benbovy commented Jan 11, 2021

The challenge here would be to support the variety of the queries supported by each index (e.g., pygeos.STRTree.query_bulk, sklearn.neighbors.KDTree.query_radius, etc.) through a common API.

By common API, I mean a small set of query methods, one for each kind of query, rather than a single query method for all kinds of queries. Still, for the same kind of query there might be subtle differences from one index object to another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants