Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Roadmap for Shapely 2.0 #1

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Apr 22, 2020

This is an initial RFC with a more detailed reasoning and proposal for the changes coming to Shapely 2.0.

You can use the "rich diff" under files changes to see a rendered version, or view the current file.

Feedback very welcome!

@brendan-ward
Copy link

@jorisvandenbossche thanks for pulling this all together here!

I'm not a shapely dev, but I have used shapely by way of geopandas fairly heavily in the past, and now use pygeos heavily for most of my geo projects - so I definitely don't want to be taken to represent the shapely community.

👍 on avoiding splitting the community where possible.

I don't think I saw it addressed directly, is the proposal here to integrate pygeos as a dependency of shapely, or migrate the internals to shapely?

I'd have to think about it further to explore the implications, but my hunch is that adding vectorized operations into their corresponding modules and avoiding API duplication definitely seems like the right approach. A separate vectorized / geos module doesn't feel like the right solution here.

API changes:

👍 on removing mutability.

👍 on removing adapter classes.

👍 on removing iterability of geometries; .geoms is the correct solution.

👍 on adding numpy dependency

Spatial indexing:
I think migrating to pygeos STRtree is the way to go, but I'm likely biased there. 😄

👎 on duplicating trees.

Providing a parameter to optionally return the geometries is definitely something to explore and could easily be implemented on top of the pygeos STRtree implementation. Whether indexes or geometries are the right response really depends on specific usage of the query / query bulk functionality: are you doing a spatial join, or are you following up with another spatial operation? My hunch is that returning geometries should be False by default due to larger overhead for that. Alternatively, having multiple API functions that return different values based on their name might be useful here tree.query_index vs tree.query_geometry; they can use the same underlying tree code, so this would have relatively little code duplication and yet not hide the differences behind a parameter.

There is still some discussion around indexes vs geometry objects depending on the spatial operation involved, which might be good to consider here as well from an API consistency standpoint. Meaning, for a given operation that could return indexes or geometries or both, how to select those via consistent parameters.

Prepared geometries:
I think doing the current thing under the hood (always using prepared geometries for predicates) is the right thing to do, and 👍 on deprecating PreparedGeometry. It reduces the API surface area and yields much better performance by default.

I think we should be aware there may be future cases where prepared geometries are suboptimal, and we should address those once we have clear test cases.

Other considerations:
How Cython use in pygeos factors into this migration.

Creating developer guides: given the increased complexity around exposing functionality from GEOS, having consistent patterns to follow will be important. Knowing when to tackle something at the C level, Cython, or Python will be helpful for contributors.

GEOS version support:
Not directly addressed here, but formalizing support for GEOS versions and deprecation schedule would be good to handle under Shapely 2.0. Plus a consistent pattern for how new functionality in newer versions of GEOS are handled.

@abey79
Copy link

abey79 commented Apr 23, 2020

Chiming in mainly to thank @jorisvandenbossche and others for the efforts made on Shapely's future, which I have been following out of keen curiosity/interest since shapely/shapely#782 was opened.

I'm certainly not a big player (so don't spend too much time listening to me 😄), but I use Shapely on a regular basis as hobbyist in the field of generative art. From all the proposed changes, I will mainly be affected by the following two:

  1. I'm regularly using iterability of Multi* objects, with lots of occurrences of for ls in mls: and similar. If 1.8 deprecation makes it easy for me to spot them all, I will happily migrate them all to .geoms, so 👍
  2. I'm very much looking forward to the possibilities and performance improvement offered by vectorised operations, which, to me, largely justify the migration effort (including point 1 above).

The other proposed changes will be mostly transparent to me, so 👍

@martinfleis
Copy link

Just a quick note re iterability of Multi-geometry. I understand the motivation and see the benefits of the proposed change, but unlike for the other proposals, this will likely affect a lot of projects.

What is the expected time frame for these steps? How much time will be between shapely 1.8 with deprecation warning and shapely 2.0? I am afraid that it might cause a lot of disruptions if there's not much time to adapt downstream projects.

To be clear, I do support the idea, but it is proposing to deprecate widely used API, so it should be done with care.

I have no issue with any of the other changes. It is exciting to see all this development. Good job @jorisvandenbossche!

@jorisvandenbossche
Copy link
Member Author

Thanks all for the feedback!

I don't think I saw it addressed directly, is the proposal here to integrate pygeos as a dependency of shapely, or migrate the internals to shapely?

I should probably make it clearer, but the idea is to move the code completely into Shapely (I think that in theory, it could also be done as keeping pygeos as a dependency, but I am not sure there are much advantages in doing so, as they would be tightly coupled)

[on spatial indexing] Providing a parameter to optionally return the geometries is definitely something to explore and could easily be implemented on top of the pygeos STRtree implementation.

Yes, I think that should indeed be easy to do. Then it becomes mainly a question of choosing defaults (keeping the current default, or deprecating that default to have the behaviour of pygeos in Shapely 2.0) or distinct APIs for both.

[regarding deprecating iteration] What is the expected time frame for these steps? How much time will be between shapely 1.8 with deprecation warning and shapely 2.0? I am afraid that it might cause a lot of disruptions if there's not much time to adapt downstream projects.

That's a good point. There will not necessarily be much time between Shapely 1.8 and Shapely 2.0 (my guess is that, initially, both will be developed somewhat simultaneously). In practice, it will probably take some more time to get Shapely 2.0 stable, but I hope not too much.
That might indeed be somewhat annoying for downstream packages needing time to update (ideally they would pin to < 2.0 for the time being, but we know that this doesn't happen in practice ...).

@abey79
Copy link

abey79 commented Apr 27, 2020

I've been thinking further about "multi" geometries and there are a couple of features I think I would like, which may or may not have an architectural/UX impact relevant to this thread:

  1. Back and forth conversion between single MultiX and array of X.
  2. Normalisation of a mixed array containing both X, MultiX and EmptyGeometry to a (possibly longer) array of X.

The use case I have in mind for (2) is operations such as difference on LineString arrays, which can lead to a mix of empty geometry, LineStrings, and MultiLineStrings. In such cases, a normalised array of LineStrings only (with empty geometries discarded and MultiLineStrings expended) is likely what most users would need. Does PyGEOS offer something like that?

Feel free to ignore if this is irrelevant to the discussion, I'll come back with an issue after the migration is completed.

@hobu
Copy link

hobu commented Apr 27, 2020

I'm not quite sure how you're going to square PyGEOS and Shapely here, but I wanted to chime in to point out that the immutability of geometries in Shapely is a feature not a bug. Early versions of Shapely had immutable geometries and it led to all kinds of trouble. GEOS' API has gotten richer in the interim, but I think you are going to find some bugaboos. There's a reason why GEOS' own internals do a lot of coordinate copying.

Rather than calling this "Shapely", which is likely to confuse just about everyone involved, why not couch this as "PyGEOS with Shapely API Compatibility"? That would be something that people can measure.

Can they drop-in replace Shapely with this? If so, great. If not, are you forking Shapely here? Is the Shapely github organization something distinct than the main tree that has existed at https://github.com/Toblerity/Shapely for almost a decade? If so, please don't call it Shapely, you're just going to confuse everyone.

I read through shapely/shapely#782 and it is clear this is a joint effort. I was confused by the Shapely github organization. My point about immutability still stands though 😄

@jorisvandenbossche

This comment has been minimized.

@jorisvandenbossche
Copy link
Member Author

@hobu just missed your updated comment, glad that is clarified! ;)

@jorisvandenbossche
Copy link
Member Author

Early versions of Shapely had immutable geometries and it led to all kinds of trouble. GEOS' API has gotten richer in the interim, but I think you are going to find some bugaboos.

Do you remember what kind of problems people ran into?

@hobu
Copy link

hobu commented Apr 27, 2020

Do you remember what kind of problems people ran into?

Ownership lineage was not so clear in the earlier days of the GEOS API. I believe this has gotten better, but there still might be some issues. The issue is changing coordinates underneath a structure (say adding new points to a ring) might cause GEOS to copy anyway because it might need to update internal structures (say prepared geometry stuff that might be invalidated by the coordinate changes).

As a GEOS C API user, there's no way to be communicated that information, and the GEOS PSC, in our infinite duncery, "deprecated" the C++ API which could give API consumers control of their own fate. Maybe the situation is better now, but be prepared to paint yourself in and out of corners if you reach for the mutable data structures brass ring. Whether or not that feature is worth it is up to you, but it is likely going to generate significant bug and support load tail for a while once it lands.

I would be careful to point out to users that a future Shapely that supports "mutable" objects doesn't necessarily mean "no copying" of coordinates, whether across the GEOS API boundary or internally within GEOS itself. The latter has had a lot of improvements made by @dbaston in 3.8+, but there are still places.

Mutable data structures is a bit at odds with the larger effort of seeking vectorized operations, and breaks the implicit contract existing Shapely users might have with the library. I think the RFCs main thrust is to do vectorized ops first and mutability is a nice-to-have. Is that correct? If so, maybe the focus of RFC #1 should be vectorized operations and table the mutability challenge for a later date.

@jorisvandenbossche
Copy link
Member Author

@hobu reading your comment now, there might be another misunderstanding: this proposes to remove mutability from Shapely. I suppose I should update the "Mutability" title as that is a bit ambiguous about what it is actually proposing.

@hobu
Copy link

hobu commented Apr 27, 2020

reading your comment now, there might be another misunderstanding: this proposes to remove mutability from Shapely.

gah, I should just stay out of all of this 😄

@jorisvandenbossche
Copy link
Member Author

Well, at least your explanation reinforces the idea that it is good to make the geometries in Shapely immutable! :)

@jorisvandenbossche
Copy link
Member Author

I've been thinking further about "multi" geometries and there are a couple of features I think I would like

@abey79 I think those are interesting feature requests, but they can probably be discussed as separate enhancements, as they don't directly are dependent on the changes proposed here. Would you like to open issues for them in the main Shapely repo?

@abey79
Copy link

abey79 commented Apr 27, 2020

That's what I thought @jorisvandenbossche, thanks. Just wanted to make sure. I'll open an issue in due time.

@adriangb
Copy link

Quick comment I'd like to bring up from another thread (link): the way pygeos wraps the ufuncs with python functions currently means we lose a lot of useful numpy ufunc functionality. It would be nice if we had a way to fully expose the ufuncs when they are integrated here, while also having good docs, etc.

@caspervdw
Copy link

Thanks a lot of writing this up in an RFC. I agree on the direction it is taking and in general it is really great to see that my(our) project is a candidate for fusing with shapely.

I had some remarks:
Immutability
I'd like to stress that the immutability is necessary for the vectorization.

Incorporation
I would say, leave functions where they are (to not unnecessarily break the API). But I'd like all the GEOS operations to be imported into the root namespace.

Empty geometry handling
There are indeed a lot of edge cases, and it could be that shapely solved them differently than pygeos. My personal opinion is to follow PostGIS, the de facto standard on this area. If that means breaking either shapely or pygeos' API, we should do it.

STRTree
I agree with @brendan-ward : one tree class with two query functions.

Prepared geometries
I like to discuss this some more. I am not convinced of deviating from the GEOS C API and Shapely API; for a low level library, we should at least offer the user control over what is happening.

array interface / ctypes
The __array_interface__ suffers from the same issues as __iter__: they don't play nicely with ufuncs. So we should deprecate them.

@sgillies
Copy link
Contributor

@jorisvandenbossche thank you for typing this up and answering questions. @caspervdw thank you for the pioneering work on PyGEOS. It's clear that we have a consensus for going forward and I suggest that we fill in the "TO FILL IN" parts of the RFC and get started.

STRTree: I think we could borrow from the BTrees package https://pythonhosted.org/BTrees/ and offer two variants: one with integer values and one with object values. I'm ambivalent about the exact API. I have a hunch that the STRTree class is used much less often than other Shapely classes and that the impact of changing the API will be much less.

About where to put new features, functions vs methods: it occurs to me that the same situation exists in Numpy. Do any of us know whether that project would approach the numpy.mean() vs numpy.ndarray,mean() relationship differently if they started over?

Prepared geometries: I tend towards keeping this explicit. My sense is that we'll sometimes want to prepare complicated scalar objects but won't need to prepare the elements of a vector.

On the subject of empty geometries within arrays, we have at least one option that PostGIS does not: masked arrays. Are masked array operations going to be within the scope of Shapely 2.0?

@brendan-ward I love the ideal of formalizing a GEOS version support and deprecation calendar.

* Use the vectorized operations for the implementation of the scalar operations,
preserving the existing Shapely scalar API.
* Make some API changes that are partly required and also desirable (e.g.
regarding mutability of geometries, ctypes interface, etc).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 👍 👍 👍 👍

@jorisvandenbossche
Copy link
Member Author

Thanks for the feedback @sgillies and @caspervdw !

Already a few responses:

Empty geometry handling
I need to look a bit into what the actual differences might be. As far as I know, it might be mostly limited to the construction in the actual geometry class constructors (eg Polygon() resulting in an empty GeometryCollection instead of empty Polygon, see shapely/shapely#742), and not in other operations.
Personally I would say to follow GEOS as much as possible here (so to not have special cases for empty geometries in the ufuncs), except for WKB, where I would indeed follow PostGIS regarding empty points (although ideally this is something that could maybe go in GEOS itself, if all of PostGIS, Shapely and sf would implement the same workaround)

I will try to update this section in the RFC with more details.

On the subject of empty geometries within arrays, we have at least one option that PostGIS does not: masked arrays. Are masked array operations going to be within the scope of Shapely 2.0?

Empty geometries are at the moment simply represented by their GEOS geometry object. But you might be thinking of "missing" geometries as well? (for which a masked array indeed could make sense).
At the moment, in PyGEOS, we allow None to be in the numpy array of geometries, and the ufuncs know how to handle this (eg skip it and propagate a None to the result as well). This gives a similar feature as NULL in PostGIS.

STRtree
It indeed should be possible to provide both APIs (objects an integers, it should already be easy to do with the current PyGEOS tree, since we store a reference to the original geometries on the tree object).
Then it becomes a matter of choosing a way to expose this (different class vs different method vs keyword parameter in a single method)

Prepared geometries

[Casper] I like to discuss this some more. I am not convinced of deviating from the GEOS C API and Shapely API; for a low level library, we should at least offer the user control over what is happening.

[Sean] Prepared geometries: I tend towards keeping this explicit. My sense is that we'll sometimes want to prepare complicated scalar objects but won't need to prepare the elements of a vector.

It's certainly something to further investigate / experiment with. At least, I think there might still be (potentially) good reasons to try to hide this from the (average) user, both for performance as for user friendly API. But for a start, I should revive my experiments with it at pygeos/pygeos#92, to see if I can come with some "evidence" for this (because if not, keeping PreparedGeometries explicit as it is now in Shapely is of course certainly fine).

@dbaston
Copy link

dbaston commented May 20, 2020

Not quite twice as big.

Size of geom::Point is 88
Size of geom::LineString is 48
Size of geom::LinearRing is 48
Size of geom::Polygon is 72
Size of geom::GeometryCollection is 64
Size of geom::MultiPoint is 64
Size of geom::MultiLineString is 64
Size of geom::MultiPolygon is 64

Size of geom::prep::PreparedPoint is 40
Size of geom::prep::PreparedLineString is 72
Size of geom::prep::PreparedPolygon is 88

These sizes are additive, so a point with a prepared point is 88 + 40. (Point is the largest geometry type because its size includes its own coordinates.)

If you actually use the prepared geometry in a predicate, they'll get fairly large (by building an STRtree of all line segments, for example.) So I can see situations where it would be problematic to create them and keep them around for all eternity.

Another option is to have the basic geometry type create/destroy a prepared geometry whenever a predicate is called. (This is what https://github.com/r-spatial/sf does.) It sounds really inefficient but should be a performance win in almost all cases.

@jorisvandenbossche
Copy link
Member Author

Update on empty geometries

I added a section on empty geometries to the RFC text, please take a look.
Specifically, I propose to:

  • Change the behaviour of the Python constructors (e.g. Polygon() to return
    POLYGON EMPTY instead of GEOMETRYCOLLECTION EMPTY).
  • Deprecate the shapely.geometry.base.EmptyGeometry class.
  • Change the shapely.wkb writer to use the "POINT (NaN NaN)" WKB to represent
    "POINT EMPTY".

Some additional comments about this topic:

[@mwtoews] However, I don't think we need to go as far as PostGIS has and provide typed empty support for dimension above 2 (e.g. MULTIPOLGYON Z EMPTY gets ridiculous).

I fully agree with your comment #1 (comment) (which I think should be reflected in the RFC update, at least regarding the Python constructors).
Regarding those empty geometries in higher dimensions, I did some checks (by experimental testing with Shapely and PyGEOS): it seems that GEOS drops extra dimensions when creating or reading empty geometries with more than 2 dimenions. So that only gives a single empty geometry type for each of the basic geometry types.
(now, I didn't check in the code if GEOS actually supports, it could also be a bug in the has_z implementation for empties with >2 dimensions).

[@sgillies] I'd love to see something better than "do what GEOS does", because it's ultimately following JTS.

I understand your hesitation regarding "just do what GEOS does". But, when it comes to spatial operations (eg an intersection resulting in a POLYGON EMPTY, and not the generic GEOMETRYCOLLECTION EMPTY), we don't really have much of a choice, I think? (unless we would start checking any GEOS operation that can potentially result in empty geometries, but I don't think we would want to do that).
And given that such type-specific empty geometries are created by GEOS in operations, it seems OK to also create those in our constructors (like Polygon()).

In the end, Shapely is using GEOS much more directly as compared to eg PostGIS (since we actually store GEOS objects), so we are more tied to what does or not does.

Copy link
Member

@mwtoews mwtoews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only a minor nit pick, otherwise the empty geometry section is nicely described.

rfc/0001-shapely-2.0-roadmap.md Outdated Show resolved Hide resolved

Access to the coordinates of a geometry as a numpy array is still available
through the `coords` attribute (`np.array(line.coords)` will still give the same
array as `np.array(line)` does now).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mwtoews
Copy link
Member

mwtoews commented Jun 10, 2020

Should SRID property use and handling be discussed in this RFC? Shapely does not fully-support SRIDs (shapely/shapely#132), except for a few things like shapely.wkb.dumps (shapely/shapely#593). PyGEOS has set operators for srid, in-line with immutable geometry handling discussed here.

@sgillies
Copy link
Contributor

@mwtoews the GEOS SRID, is in my opinion, nothing but leakage of concern from PostGIS. If I read https://trac.osgeo.org/geos/ticket/896 correctly, at least one of the core developers of GEOS agrees.

I'm a big 👎 on shapely ever trying to support operations and predicates between geometries in different reference systems, and so SRID isn't useful at the geometry level.

@hobu
Copy link

hobu commented Jun 10, 2020

I'm a big 👎 on shapely ever trying to support operations and predicates between geometries in different reference systems, and so SRID isn't useful at the geometry level.

I also think SRID in GEOS is leakage from PG, and furthermore, it by itself isn't sufficient in a lot of cases anyway. Coordinates are orthogonal to reference system, and users should be managing that external to Shapely or GEOS.

@mwtoews
Copy link
Member

mwtoews commented Jun 11, 2020

The odd thing is that PostGIS doesn't even use GEOS for WK[TB] support, so the probable reason why SRIDs are part of GEOS is because of OGC 06-103r4 compliance (linked above).

Frankly, I'd be fine if SRIDs were deprecated and removed from GEOS in a future release, as a PostGIS user could manipulate data originated from GEOS/Shapely (e.g. SELECT ST_SetSRID(geom, 4326) etc.). I generally agree that SRID is unnecessary geometry information, as this metadata is best described elsewhere.

This still leaves the question of how to integrate PyGEOS's support of SRIDs. I'd suggest to leave the get_srid and set_srid functions as they are, but perhaps move them to a utility module, since they are only appropriate for interfacing with client software that expect them (i.e. PG).

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Jun 12, 2020

I don't think this topic of SRIDs necessarily needs specific attention in the RFC, unless others prefer a summary added, then I can do that.

In the end, the handling of SRID in operations is fully outside of Shapely's control, except for potentially providing functionality to get/set SRID and include it in WKB/WKT IO.
So IMO Shapely can just continue doing that: not care about SRIDs in any operation, but expose the functionality only for get/set and IO.

Deprecating SRIDs or not is a topic for GEOS to discuss, but as long as some functions related to it are exposed in the GEOS C API, I would prefer to keep exposing this in Shapely as well. For example, in GeoPandas we currently actually use this to convert geometries to WKB with SRID included, as that is what PostGIS requires (if your PostGIS table is spatially referenced, then PostGIS will error on inserting any geometry without an SRID set).

@sgillies
Copy link
Contributor

Sounds good to me @jorisvandenbossche , let's omit it from the RFC.

@jorisvandenbossche
Copy link
Member Author

A short status update here: I started the work for this in Shapely, starting with the deprecations for 1.8 (shapely/shapely#932). It was a bit stagnant the last weeks, though, and now I am offline until the end of the month. But for August, I am planning to take this up again and at a higher pace (both finishing the deprecations and starting to actually integrate pygeos into shapely). And contributions to those work items are of course always welcome as well!

In the meantime, @caspervdw did also some more work in PyGEOS to release the GIL in the vectorized operations (important for parallelizing).

For the actual RFC, I think the outstanding items are the SRTtree interface (how to enable returning indices instead of geometries -> I think we agree on the principle, it's a question of choosing the API, eg a keyword to trigger it), and the prepared geometries.

@chaitan94
Copy link

Sounds good to me @jorisvandenbossche, let's omit it from the RFC.

Sorry to bump this up again, but I think it would be great to have SRID supported as well in Shapely 2.0. Nothing fancy, just exposing the get/set from what current GEOS API already offers. This would make it much easier when working with PostGIS, as it has been pointed out before.

But first, let me give some more context. I am a member of the MobilityDB team, which is a database much similar to PostGIS. In fact it extends PostGIS to add support for spatiotemporal operations, while relying as much as possible on PostGIS for pure spatial operations. I am currently working on a project called PyMEOS which is related to MobilityDB like how Shapely is related to PostGIS. Again, ideally, I would like to rely on Shapely as much as possible for all pure spatial operations. The only reason we haven't been doing this yet is because of this lack of get/set API for SRIDs in shapely.

So it would be great if we can know the final stance over this. And if Shapely plans to add support for this, and if I can be of any help for it, I can try to contribute what I can as well. Would love to know thoughts on this 😄

@sgillies
Copy link
Contributor

@chaitan94 do you want it enough to write and submit a different RFC? I haven't seen anything written here since June 10 that has persuaded me that it's worth doing, but some very good writing with compelling use cases might change my mind.

@jorisvandenbossche
Copy link
Member Author

@chaitan94 I think you misinterpreted my comment above (#1 (comment)): the reason that I proposed to omit it from the RFC is not because nothing related to SRID will be included in Shapely, but just because there is nothing special about it worth mentioning in the RFC (or inherently different from what shapely does does). We will simply expose few of the things that GEOS itself provides around SRID: support for including it in WKB IO, and if it depends on me also basic getting/setting it. But nothing more than that, so eg spatial operations won't support it (or preserve it), that's something to discuss in GEOS itself and outside of the scope of this RFC.
(I mentioned PostGIS myself as a usecase for this WKB IO support, and which is a feature that is already present in Shapely right now)

And to be clear: PyGEOS already has this (there is a get_srid/get_srid, and a keyword in to_wkb to include it)

@chaitan94
Copy link

@sgillies I am not sure if it is a significant enough change to demand its own RFC, but sure, I can do that if it helps.

@jorisvandenbossche Yes, I did understand that it was just omitted from here. Nevertheless, I just wanted to have a bit more clarity on whether the current Shapely 2.0 implementation would have this support for SRID or not. I felt that since GEOS and PyGEOS already do have support for it already, it should be straightforward to expose it at Shapely level. If you also think that a separate RFC would help too, maybe I can do that.

@jorisvandenbossche
Copy link
Member Author

For the same reason as I omitted it from this RFC, I personally don't think a separate RFC is needed if it's only about getting/setting for WKB IO.
This limited functionality is already in PyGEOS (and also in Shapely, although a bit hidden, but we use it in GeoPandas right now), and when PyGEOS gets integrated, I was planning to simply keep it and thus the functionality will keep existing in Shapely.

Now, if that understanding is not fully how others understood the above discussion, we should certainly further discuss it (but again, it is not about Shapely doing special effort to support SRIDs in spatial operations, it's only about exposing the limited API of GEOS to get/set it for WKB IO. @sgillies you're on board with that?)

Comment on lines +105 to +106
* `ctypes` is not (or no longer) the recommended way to wrap a complex C/C++
library. It adds overhead and possible runtime issues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from the peanut gallery: do you have some references on this? I assume that the current consensus is to either do a Python extension type or to use CFFI, but it would be great to have some reading material on the topic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrojuanlu sorry for the verylate reply. I don't have a direct reference at hand that explains this (more based on general experience and from talking to other people), but we certainly had several reports with installation issues related to this.

@jorisvandenbossche
Copy link
Member Author

To give an update on this: over the last months, I have been slowly refactoring Shapely to start using the pygeos implementation (in the shapely-2.0 branch, see shapely/shapely#962 for some track record). The Geometry classes are now subclasses of the C extension type defined in PyGEOS, and large parts of the rest of the library no longer uses the ctypes implementation. There are a few remaining parts to do (strtree, shapely.ops).

One aspect of the above discussion that was not yet finalized: the API for the STRtree. I think there is agreement on having the ability to return integer indices instead of actual geometry objects from the query methods. But several ideas on how to expose this to the user have been mentioned above: separate classes, or a single class with separate methods, or keeping the same methods but with a keyword to switch between both.

We need to make some choice here, and I made a PR in Shapely to add a keyword: shapely/shapely#1064. Feedback on that is very welcome!

@astrojuanlu
Copy link

Wondering if this is ready for merge? :)

@mwtoews
Copy link
Member

mwtoews commented Oct 17, 2023

Wondering if this is ready for merge? :)

Yes it should, @sgillies could you merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.