Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import via buffer protocol #57

Closed
wants to merge 14 commits into from
Closed

Conversation

kylebarron
Copy link
Owner

@kylebarron kylebarron commented Aug 21, 2024

@H-Plus-Time I think this is a bit cleaner than what you have in #56

python/src/rtree.rs Outdated Show resolved Hide resolved
python/src/rtree.rs Outdated Show resolved Hide resolved
@H-Plus-Time
Copy link
Contributor

Much nicer - yeah, I was very wary of transmute, this looks like a better alternative 😄 .

With the GIL lock, that pretty much only applies at buffer protocol import time, yeah?

Vectorized search would be a nice thing to have (unless it's obviated by tree to tree intersection candidates), and I'd imagine GIL locking would be a pain there.

@kylebarron
Copy link
Owner Author

kylebarron commented Aug 22, 2024

With the GIL lock, that pretty much only applies at buffer protocol import time, yeah?

In theory, I think you're supposed to hold the GIL for the entire lifetime of the slice. But IMO I think it would be ok to make clear in the documentation that the input must not be mutated.

Vectorized search would be a nice thing to have

You mean searching a tree for lots of boxes at once, not just a single box? We could provide a naive implementation that does the loop in Rust. I suppose that's a separate concept from tree to tree intersection.

In case you're looking for a place to contribute and/or it's something you want to use 😄 , I implemented tree to tree intersection myself (as in, not something I ported from flatbush.js) and it appears to have a bug that I haven't had time to look into. See #42 and #51. Though it might not be trivial to fix

@kylebarron
Copy link
Owner Author

I think I tried to generalize too soon and it became more work than I expected 😢 . Probably not worth trying to support non-float box types yet

@kylebarron
Copy link
Owner Author

Also I was thinking of trying to support both numpy and arrow input, but that seems a little hard. In general I'm trying to move towards arrow-first, but it seems wrong to not support arrow here.

@kylebarron
Copy link
Owner Author

It would probably be simplest to separate out the constructors for numpy and arrow. The consumer can deal with that. So we'd have

RTree.from_interleaved
RTree.from_separated
RTree.from_arrow

where the from_arrow version will look for a geoarrow box array.

@H-Plus-Time
Copy link
Contributor

I think I tried to generalize too soon and it became more work than I expected 😢 . Probably not worth trying to support non-float box types yet

Yup, that's midnight-brain, I know it well 😬

@H-Plus-Time
Copy link
Contributor

It would probably be simplest to separate out the constructors for numpy and arrow. The consumer can deal with that. So we'd have

RTree.from_interleaved
RTree.from_separated
RTree.from_arrow

where the from_arrow version will look for a geoarrow box array.

Yeah, may as well - the python tendency toward methods that take basically anything (as in lists of dicts OR dicts of lists OR random type X) and flag out how it's interpreted... Is maybe not a great thing to perpetuate.

@kylebarron
Copy link
Owner Author

Often my midnight brain is very productive 😛

Yeah, may as well - the python tendency toward methods that take basically anything (as in lists of dicts OR dicts of lists OR random type X) and flag out how it's interpreted... Is maybe not a great thing to perpetuate.

I'm definitely not a fan of accepting any kind of input. Especially now that we have good type checking in Python, each method should take only one kind of input. I was mostly thinking that arrow and numpy arrays could be interchangeably accepted. But I think separating into from_arrow is fine

@kylebarron
Copy link
Owner Author

kylebarron commented Oct 2, 2024

In kylebarron/arro3#204 I figured out a way to have Python-backed Arrow Array objects. So we can interpret a numpy numeric array as a non-null Arrow array without any memory copies.

The core geo-index crate doesn't use Arrow, but it may be worth it to use pyo3-arrow in the python bindings, because then we can accept either numpy or arrow input for the tree

@kylebarron
Copy link
Owner Author

And in kylebarron/arro3#208 I automatically handle multidimensional numpy arrays as input, converting them (zero-copy) to nested fixed size lists

@kylebarron
Copy link
Owner Author

Latest update removes a lot of code by reusing the PyArrowBuffer from pyo3_arrow::buffer. See kylebarron/arro3#241 and kylebarron/arro3#242.

Work still to do:

python/src/rtree.rs Outdated Show resolved Hide resolved
python/src/rtree.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants