-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Arrow PyCapsule Interface #3568
Comments
Hi @kylebarron ! I'm certainly interested in doing what I can to facilitate this
Could you show how this would work please? There's only one teeny-tiny Polars-specific piece of code in Altair, and it's not clear to me how the Arrow C Interface would address it, but I might be missing something |
I'm referring to these lines: Lines 421 to 431 in 5207768
Those may not be solely for Polars, but a primary goal of the PyCapsule Interface is to standardize the method name by which one library exports data to others. So instead of checking for all these possible names, you can use So essentially: -for convert_method_name in ("arrow", "to_arrow", "to_arrow_table", "to_pyarrow"):
- convert_method = getattr(dfi_df, convert_method_name, None)
- if callable(convert_method):
- result = convert_method()
- if isinstance(result, pa.Table):
- return result
+ if hasattr(dfi_df, "__arrow_c_stream__"):
+ return pa.table(dfi_df) |
Polars input wouldn't go down those lines anyway, as it would already have been handled in the Narwhals path 😉 (and it wouldn't involve any conversion to pyarrow) Isn't pycapsule interface adoption a bit too recent for it to be used here as the only way to covert to pyarrow? It would cut off support for oldish versions of Ibis / DuckDB for whom the current code works fine. But using it if it's available, in addition to the current code but before the interchange protocol, sounds like a good idea 👍 |
Ah, I hadn't noticed that.
Yes, I should've been more clear about sometime in the future when you're ok with the pyarrow version constraint, you can remove those lines of code. For the time being, I'd suggest it as an addition, not a replacement, to those existing checks. |
@MarcoGorelli @jonmmease |
😄 yup, was discussing some things related to this today: duckdb/duckdb#15536 |
Thanks @MarcoGorelli! Interesting timing 😉 I'm gonna have a read through the doc as I'm also surprised by the behavior in UpdateReading through these gives me the impression you'd need control over requesting a new stream (for this to be suitable in EDA).
I can't help but think that this is too low-level to find it's way into |
What is your suggestion?
👋 The Arrow project recently created the Arrow PyCapsule Interface, a new protocol for sharing Arrow data in Python. Among its goals is allowing Arrow data interchange without requiring the use of pyarrow, but I'm also excited about the prospect of an ecosystem that can share data only by the presence of dunder methods, where producer and consumer don't have to have prior knowledge of each other.
This would allow Altair to work out of the box with any Arrow-based object that supports this interface.
I've been working to promote the PyCapsule Interface across the ecosystem, with many libraries having adopted support so far.
Given that altair already has an optional dependency on pyarrow, the easiest implementation would be a simple addition in here:
altair/altair/utils/data.py
Lines 417 to 434 in 5207768
to first call
This would also allow you to remove your polars-specific code, because polars implements the Arrow PyCapsule interface (pola-rs/polars#17676) as of Polars 1.3.
Alternatively, this interface would enable you to accept Arrow input data without a pyarrow dependency, if that's attractive.
I figure @MarcoGorelli also has opinions about this given #3445. Narwhals also supports PyCapsule Interface export: narwhals-dev/narwhals#786.
Have you considered any alternative solutions?
Altair already supports the DataFrame Interchange Protocol, but that is not a direct replacement for the PyCapsule Interface. The PyCapsule Interface is much easier to implement for Arrow-based libraries and allows zero-copy data exchange with very little overhead. There are many libraries that would implement the PyCapsule Interface without wanting to go through the trouble of implementing the DataFrame Interchange Protocol.
Also relevant is that vegafusion is planning to adopt this, notwithstanding a Rust technical issue vega/vegafusion#501
The text was updated successfully, but these errors were encountered: