-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: upgrade to datafusion 43 #2886
base: main
Are you sure you want to change the base?
Conversation
We need Datafusion to upgrade as well before we can do this. I have some patches to make the latest datafusion work but we're blocked on them to adopt the newest arrow and kernel |
9d686df
to
d4bfacd
Compare
Hi, datafusion has been released. Let's rock! |
Thank you @ion-elgreco -- let me know if there is anything I can to do help with this PR (e.g. I could work on updating the code to use non deprecated APIs in the core for parquet statistics...) |
8db59c2
to
b325e27
Compare
@alamb there seems to be a regression in DF, see https://github.com/delta-io/delta-rs/actions/runs/10959345721/job/30431504526?pr=2886#step:7:948. A literal with a list is creating a dtype list with inner type using the name "item", we however have our data internally read according to the parquet spec, so that when we write the parquet files are meeting the field naming of the spec. I had a quick glance at the Changelog but can't really pinpoint where this change may have occurred, do you have any ideas?
|
It sounds somewhat similar to a discussion happening here about nullability (but I think field name is similar in terms of how it is treated): apache/datafusion#11989 (comment) Update; filed apache/datafusion#12560 to track the regression more |
f824d7a
to
a1a33ce
Compare
@alamb thanks for following up! I feel a little sheepish 🐑 insofar that my internal CI showed these errors weeks ago in my integration testing between datafusion I've subscribed to the issue you created and can help testing if necessary! Depending on my availability this weekend I can try fixing it in datafusion too 🤔 |
Hi there - I believe a fix was implemented in datafusion and this PR is not blocked any longer. |
It's not yet released though |
I can confirm that the latest datafusion main passes tests in CI 🥳 |
4f30e16
to
bac3306
Compare
I filed apache/datafusion#12813 to consider releasing a patch version of DataFusion |
c9933f1
to
a82c295
Compare
@alamb is there any change still required? Because I still see failed tests |
I don't know to be honest. |
@ion-elgreco don't worry about this pull request, I will be taking care of it. The 43 release is imminent 🎊 |
Hi there - just doing a pulse check on this PR. It looks like its been updated to DataFusion v43 but I still see several failing CI checks. Are there new regressions in v43? |
@matthewmturner I just did some more fixes on our side, I noticed we still have what appears to be a regression https://github.com/delta-io/delta-rs/actions/runs/11860379826/job/33055504796?pr=2886#step:4:2436 could someone take a look at this? |
I think the problem is that the projected schema for array types is changing the field name for the elements of the array from "element" to "item". I have seen this before, but I'm not sure exactly where that is happening. I will try to keep digging in. Which is to say, I think the problem may be upstream in datafusion. |
This last test failure I believe is because of the use of the make_array UDF which expects items in the array to be I'm not convinced this is a Datafusion issue per se, I'm still digging through some code to see if they've standardized on using "element" vs "item". Either way I don't want to wait for datafusion 44 to merge this change, so I'll be figuring out a workaround shortly 🤗 |
Please let me know what you discover. I do expect we may release a DF 43.1.0 due to a number of changes recently to account for the new default StringView instead of String. If the root problem is on the DF side, we could potentially include it as well. |
@timsaucer My preference would certainly be for |
Is it perhaps possible to include UDFs that just let you specify an optional field name for the inner field like this? Because I know Map also gets the same treatment of "what are the inner fields called, well we call it this, they call it that." So it might just let more advanced users choose their own adventure instead of having to go back and forth. |
64704ad
to
dbd26eb
Compare
I have rebased on |
Signed-off-by: R. Tyler Croy <[email protected]>
The release of pyo3 0.22.3 compells this since we cannot otherwise compile. The choice is between pinning 0.22.2 and upgrading our ABI, and I think it's better to upgrade the ABI Signed-off-by: R. Tyler Croy <[email protected]>
see delta-io/delta-kernel-rs#301 Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
Today the make_array function from Datafusion uses "item" as the list element's field name. With recent changes in delta-kernel-rs we have switched to calling it "element" which is more conventional related to how Apache Parquet handles things Signed-off-by: R. Tyler Croy <[email protected]>
dbd26eb
to
e363ee6
Compare
Description
Bump kernel