-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Roadmap for Shapely 2.0 #1
base: master
Are you sure you want to change the base?
RFC: Roadmap for Shapely 2.0 #1
Conversation
@jorisvandenbossche thanks for pulling this all together here! I'm not a shapely dev, but I have used shapely by way of geopandas fairly heavily in the past, and now use pygeos heavily for most of my geo projects - so I definitely don't want to be taken to represent the shapely community. 👍 on avoiding splitting the community where possible. I don't think I saw it addressed directly, is the proposal here to integrate pygeos as a dependency of shapely, or migrate the internals to shapely? I'd have to think about it further to explore the implications, but my hunch is that adding vectorized operations into their corresponding modules and avoiding API duplication definitely seems like the right approach. A separate vectorized / geos module doesn't feel like the right solution here. API changes: 👍 on removing mutability. 👍 on removing adapter classes. 👍 on removing iterability of geometries; 👍 on adding numpy dependency Spatial indexing: 👎 on duplicating trees. Providing a parameter to optionally return the geometries is definitely something to explore and could easily be implemented on top of the pygeos STRtree implementation. Whether indexes or geometries are the right response really depends on specific usage of the There is still some discussion around indexes vs geometry objects depending on the spatial operation involved, which might be good to consider here as well from an API consistency standpoint. Meaning, for a given operation that could return indexes or geometries or both, how to select those via consistent parameters. Prepared geometries: I think we should be aware there may be future cases where prepared geometries are suboptimal, and we should address those once we have clear test cases. Other considerations: Creating developer guides: given the increased complexity around exposing functionality from GEOS, having consistent patterns to follow will be important. Knowing when to tackle something at the C level, Cython, or Python will be helpful for contributors. GEOS version support: |
Chiming in mainly to thank @jorisvandenbossche and others for the efforts made on Shapely's future, which I have been following out of keen curiosity/interest since shapely/shapely#782 was opened. I'm certainly not a big player (so don't spend too much time listening to me 😄), but I use Shapely on a regular basis as hobbyist in the field of generative art. From all the proposed changes, I will mainly be affected by the following two:
The other proposed changes will be mostly transparent to me, so 👍 |
Just a quick note re iterability of Multi-geometry. I understand the motivation and see the benefits of the proposed change, but unlike for the other proposals, this will likely affect a lot of projects. What is the expected time frame for these steps? How much time will be between shapely 1.8 with deprecation warning and shapely 2.0? I am afraid that it might cause a lot of disruptions if there's not much time to adapt downstream projects. To be clear, I do support the idea, but it is proposing to deprecate widely used API, so it should be done with care. I have no issue with any of the other changes. It is exciting to see all this development. Good job @jorisvandenbossche! |
Thanks all for the feedback!
I should probably make it clearer, but the idea is to move the code completely into Shapely (I think that in theory, it could also be done as keeping pygeos as a dependency, but I am not sure there are much advantages in doing so, as they would be tightly coupled)
Yes, I think that should indeed be easy to do. Then it becomes mainly a question of choosing defaults (keeping the current default, or deprecating that default to have the behaviour of pygeos in Shapely 2.0) or distinct APIs for both.
That's a good point. There will not necessarily be much time between Shapely 1.8 and Shapely 2.0 (my guess is that, initially, both will be developed somewhat simultaneously). In practice, it will probably take some more time to get Shapely 2.0 stable, but I hope not too much. |
I've been thinking further about "multi" geometries and there are a couple of features I think I would like, which may or may not have an architectural/UX impact relevant to this thread:
The use case I have in mind for (2) is operations such as difference on LineString arrays, which can lead to a mix of empty geometry, LineStrings, and MultiLineStrings. In such cases, a normalised array of LineStrings only (with empty geometries discarded and MultiLineStrings expended) is likely what most users would need. Does PyGEOS offer something like that? Feel free to ignore if this is irrelevant to the discussion, I'll come back with an issue after the migration is completed. |
I'm not quite sure how you're going to square PyGEOS and Shapely here, but I wanted to chime in to point out that the immutability of geometries in Shapely is a feature not a bug. Early versions of Shapely had immutable geometries and it led to all kinds of trouble. GEOS' API has gotten richer in the interim, but I think you are going to find some bugaboos. There's a reason why GEOS' own internals do a lot of coordinate copying.
I read through shapely/shapely#782 and it is clear this is a joint effort. I was confused by the Shapely github organization. My point about immutability still stands though 😄 |
This comment has been minimized.
This comment has been minimized.
@hobu just missed your updated comment, glad that is clarified! ;) |
Do you remember what kind of problems people ran into? |
Ownership lineage was not so clear in the earlier days of the GEOS API. I believe this has gotten better, but there still might be some issues. The issue is changing coordinates underneath a structure (say adding new points to a ring) might cause GEOS to copy anyway because it might need to update internal structures (say prepared geometry stuff that might be invalidated by the coordinate changes). As a GEOS C API user, there's no way to be communicated that information, and the GEOS PSC, in our infinite duncery, "deprecated" the C++ API which could give API consumers control of their own fate. Maybe the situation is better now, but be prepared to paint yourself in and out of corners if you reach for the mutable data structures brass ring. Whether or not that feature is worth it is up to you, but it is likely going to generate significant bug and support load tail for a while once it lands. I would be careful to point out to users that a future Shapely that supports "mutable" objects doesn't necessarily mean "no copying" of coordinates, whether across the GEOS API boundary or internally within GEOS itself. The latter has had a lot of improvements made by @dbaston in 3.8+, but there are still places. Mutable data structures is a bit at odds with the larger effort of seeking vectorized operations, and breaks the implicit contract existing Shapely users might have with the library. I think the RFCs main thrust is to do vectorized ops first and mutability is a nice-to-have. Is that correct? If so, maybe the focus of RFC #1 should be vectorized operations and table the mutability challenge for a later date. |
@hobu reading your comment now, there might be another misunderstanding: this proposes to remove mutability from Shapely. I suppose I should update the "Mutability" title as that is a bit ambiguous about what it is actually proposing. |
gah, I should just stay out of all of this 😄 |
Well, at least your explanation reinforces the idea that it is good to make the geometries in Shapely immutable! :) |
@abey79 I think those are interesting feature requests, but they can probably be discussed as separate enhancements, as they don't directly are dependent on the changes proposed here. Would you like to open issues for them in the main Shapely repo? |
That's what I thought @jorisvandenbossche, thanks. Just wanted to make sure. I'll open an issue in due time. |
Quick comment I'd like to bring up from another thread (link): the way pygeos wraps the ufuncs with python functions currently means we lose a lot of useful numpy ufunc functionality. It would be nice if we had a way to fully expose the ufuncs when they are integrated here, while also having good docs, etc. |
Thanks a lot of writing this up in an RFC. I agree on the direction it is taking and in general it is really great to see that my(our) project is a candidate for fusing with shapely. I had some remarks: Incorporation Empty geometry handling STRTree Prepared geometries array interface / ctypes |
@jorisvandenbossche thank you for typing this up and answering questions. @caspervdw thank you for the pioneering work on PyGEOS. It's clear that we have a consensus for going forward and I suggest that we fill in the "TO FILL IN" parts of the RFC and get started. STRTree: I think we could borrow from the BTrees package https://pythonhosted.org/BTrees/ and offer two variants: one with integer values and one with object values. I'm ambivalent about the exact API. I have a hunch that the STRTree class is used much less often than other Shapely classes and that the impact of changing the API will be much less. About where to put new features, functions vs methods: it occurs to me that the same situation exists in Numpy. Do any of us know whether that project would approach the Prepared geometries: I tend towards keeping this explicit. My sense is that we'll sometimes want to prepare complicated scalar objects but won't need to prepare the elements of a vector. On the subject of empty geometries within arrays, we have at least one option that PostGIS does not: masked arrays. Are masked array operations going to be within the scope of Shapely 2.0? @brendan-ward I love the ideal of formalizing a GEOS version support and deprecation calendar. |
* Use the vectorized operations for the implementation of the scalar operations, | ||
preserving the existing Shapely scalar API. | ||
* Make some API changes that are partly required and also desirable (e.g. | ||
regarding mutability of geometries, ctypes interface, etc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 👍 👍 👍 👍
Co-authored-by: Sean Gillies <[email protected]>
Thanks for the feedback @sgillies and @caspervdw ! Already a few responses: Empty geometry handling I will try to update this section in the RFC with more details.
Empty geometries are at the moment simply represented by their GEOS geometry object. But you might be thinking of "missing" geometries as well? (for which a masked array indeed could make sense). STRtree Prepared geometries
It's certainly something to further investigate / experiment with. At least, I think there might still be (potentially) good reasons to try to hide this from the (average) user, both for performance as for user friendly API. But for a start, I should revive my experiments with it at pygeos/pygeos#92, to see if I can come with some "evidence" for this (because if not, keeping PreparedGeometries explicit as it is now in Shapely is of course certainly fine). |
Not quite twice as big.
These sizes are additive, so a point with a prepared point is 88 + 40. (Point is the largest geometry type because its size includes its own coordinates.) If you actually use the prepared geometry in a predicate, they'll get fairly large (by building an STRtree of all line segments, for example.) So I can see situations where it would be problematic to create them and keep them around for all eternity. Another option is to have the basic geometry type create/destroy a prepared geometry whenever a predicate is called. (This is what https://github.com/r-spatial/sf does.) It sounds really inefficient but should be a performance win in almost all cases. |
Update on empty geometriesI added a section on empty geometries to the RFC text, please take a look.
Some additional comments about this topic:
I fully agree with your comment #1 (comment) (which I think should be reflected in the RFC update, at least regarding the Python constructors).
I understand your hesitation regarding "just do what GEOS does". But, when it comes to spatial operations (eg an intersection resulting in a POLYGON EMPTY, and not the generic GEOMETRYCOLLECTION EMPTY), we don't really have much of a choice, I think? (unless we would start checking any GEOS operation that can potentially result in empty geometries, but I don't think we would want to do that). In the end, Shapely is using GEOS much more directly as compared to eg PostGIS (since we actually store GEOS objects), so we are more tied to what does or not does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only a minor nit pick, otherwise the empty geometry section is nicely described.
|
||
Access to the coordinates of a geometry as a numpy array is still available | ||
through the `coords` attribute (`np.array(line.coords)` will still give the same | ||
array as `np.array(line)` does now). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Should SRID property use and handling be discussed in this RFC? Shapely does not fully-support SRIDs (shapely/shapely#132), except for a few things like |
@mwtoews the GEOS SRID, is in my opinion, nothing but leakage of concern from PostGIS. If I read https://trac.osgeo.org/geos/ticket/896 correctly, at least one of the core developers of GEOS agrees. I'm a big 👎 on shapely ever trying to support operations and predicates between geometries in different reference systems, and so SRID isn't useful at the geometry level. |
I also think SRID in GEOS is leakage from PG, and furthermore, it by itself isn't sufficient in a lot of cases anyway. Coordinates are orthogonal to reference system, and users should be managing that external to Shapely or GEOS. |
The odd thing is that PostGIS doesn't even use GEOS for WK[TB] support, so the probable reason why SRIDs are part of GEOS is because of OGC 06-103r4 compliance (linked above). Frankly, I'd be fine if SRIDs were deprecated and removed from GEOS in a future release, as a PostGIS user could manipulate data originated from GEOS/Shapely (e.g. This still leaves the question of how to integrate PyGEOS's support of SRIDs. I'd suggest to leave the |
I don't think this topic of SRIDs necessarily needs specific attention in the RFC, unless others prefer a summary added, then I can do that. In the end, the handling of SRID in operations is fully outside of Shapely's control, except for potentially providing functionality to get/set SRID and include it in WKB/WKT IO. Deprecating SRIDs or not is a topic for GEOS to discuss, but as long as some functions related to it are exposed in the GEOS C API, I would prefer to keep exposing this in Shapely as well. For example, in GeoPandas we currently actually use this to convert geometries to WKB with SRID included, as that is what PostGIS requires (if your PostGIS table is spatially referenced, then PostGIS will error on inserting any geometry without an SRID set). |
Sounds good to me @jorisvandenbossche , let's omit it from the RFC. |
A short status update here: I started the work for this in Shapely, starting with the deprecations for 1.8 (shapely/shapely#932). It was a bit stagnant the last weeks, though, and now I am offline until the end of the month. But for August, I am planning to take this up again and at a higher pace (both finishing the deprecations and starting to actually integrate pygeos into shapely). And contributions to those work items are of course always welcome as well! In the meantime, @caspervdw did also some more work in PyGEOS to release the GIL in the vectorized operations (important for parallelizing). For the actual RFC, I think the outstanding items are the SRTtree interface (how to enable returning indices instead of geometries -> I think we agree on the principle, it's a question of choosing the API, eg a keyword to trigger it), and the prepared geometries. |
Sorry to bump this up again, but I think it would be great to have SRID supported as well in Shapely 2.0. Nothing fancy, just exposing the get/set from what current GEOS API already offers. This would make it much easier when working with PostGIS, as it has been pointed out before. But first, let me give some more context. I am a member of the MobilityDB team, which is a database much similar to PostGIS. In fact it extends PostGIS to add support for spatiotemporal operations, while relying as much as possible on PostGIS for pure spatial operations. I am currently working on a project called PyMEOS which is related to MobilityDB like how Shapely is related to PostGIS. Again, ideally, I would like to rely on Shapely as much as possible for all pure spatial operations. The only reason we haven't been doing this yet is because of this lack of get/set API for SRIDs in shapely. So it would be great if we can know the final stance over this. And if Shapely plans to add support for this, and if I can be of any help for it, I can try to contribute what I can as well. Would love to know thoughts on this 😄 |
@chaitan94 do you want it enough to write and submit a different RFC? I haven't seen anything written here since June 10 that has persuaded me that it's worth doing, but some very good writing with compelling use cases might change my mind. |
@chaitan94 I think you misinterpreted my comment above (#1 (comment)): the reason that I proposed to omit it from the RFC is not because nothing related to SRID will be included in Shapely, but just because there is nothing special about it worth mentioning in the RFC (or inherently different from what shapely does does). We will simply expose few of the things that GEOS itself provides around SRID: support for including it in WKB IO, and if it depends on me also basic getting/setting it. But nothing more than that, so eg spatial operations won't support it (or preserve it), that's something to discuss in GEOS itself and outside of the scope of this RFC. And to be clear: PyGEOS already has this (there is a |
@sgillies I am not sure if it is a significant enough change to demand its own RFC, but sure, I can do that if it helps. @jorisvandenbossche Yes, I did understand that it was just omitted from here. Nevertheless, I just wanted to have a bit more clarity on whether the current Shapely 2.0 implementation would have this support for SRID or not. I felt that since GEOS and PyGEOS already do have support for it already, it should be straightforward to expose it at Shapely level. If you also think that a separate RFC would help too, maybe I can do that. |
For the same reason as I omitted it from this RFC, I personally don't think a separate RFC is needed if it's only about getting/setting for WKB IO. Now, if that understanding is not fully how others understood the above discussion, we should certainly further discuss it (but again, it is not about Shapely doing special effort to support SRIDs in spatial operations, it's only about exposing the limited API of GEOS to get/set it for WKB IO. @sgillies you're on board with that?) |
* `ctypes` is not (or no longer) the recommended way to wrap a complex C/C++ | ||
library. It adds overhead and possible runtime issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from the peanut gallery: do you have some references on this? I assume that the current consensus is to either do a Python extension type or to use CFFI, but it would be great to have some reading material on the topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astrojuanlu sorry for the verylate reply. I don't have a direct reference at hand that explains this (more based on general experience and from talking to other people), but we certainly had several reports with installation issues related to this.
To give an update on this: over the last months, I have been slowly refactoring Shapely to start using the pygeos implementation (in the One aspect of the above discussion that was not yet finalized: the API for the We need to make some choice here, and I made a PR in Shapely to add a keyword: shapely/shapely#1064. Feedback on that is very welcome! |
Wondering if this is ready for merge? :) |
Yes it should, @sgillies could you merge this? |
This is an initial RFC with a more detailed reasoning and proposal for the changes coming to Shapely 2.0.
You can use the "rich diff" under files changes to see a rendered version, or view the current file.
Feedback very welcome!