-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patched DataFusion, Oct 30, 2024 #48
Conversation
03d928c
to
3afc128
Compare
89ae9bf
to
1e9c1f1
Compare
@alamb note that this is a new commit. .
@alamb note that this is a newly re-written commit, now that we advanced to the latest (on DF main) version of |
…3305) * test(12733): reproducers for schema bugs * fix(12733): properly extract field metadata from Cast expr * test(12733): update metadata preservation test, for new contract (a.k.a. cast preserves field metadata)
…pache#12562)" This reverts commit 577e4bb.
…on-view Utf8s, and test does pass (due to coercion?)
…ory for the aggregates
1e9c1f1
to
50e1209
Compare
…improvement for entire ClickBench suite) (apache#13101)" This reverts commit 2d7892b.
In order to get iox CI passing, I decided to back out the commit to turn on view types. See this PR description. In brief, existing parquet with string fields was failing in our iox tests. IMO, we should try making this change separately; not with 2 weeks of other DF changes. What do you think @alamb ? |
I recommend:
The rationale to do this is to keep our fork as close as possible to DataFusion main (and we can then work on using the default value of the (we can also do this in a follow on PR) |
@alamb -- Agreed that we should file this as a tech debt ticket in iox, to get us onto view types. However, I'm not sure whether our "tech debt state" should be the revert upstream commit -- or figuring out how to turn it off via the config. Yesterday I tried turning off via the config first, but was still getting CI errors. Here is an example WIP showing that we need more than just setting the config. (Or maybe I'm missing another point of where the config should be set? 🤔 ) Specifically, the CI errors were the bits mentioned in the PR description. It was still attempting to use utf8view types and failing the iox schema check. I could modify the check, but then it failed on the various iox-type to arrow-type mappings. Etc. It became clear that this was a larger thing, to be done in it's own (atomically revertable) pr IMO. |
I just realized that this^^ observation may mean that we have an upstream DF bug. Since when turning off view types, we were still getting view types. 🤦🏼♀️ (Or I'm missing a config setting somewhere in iox.) |
And another thing that bugs me in upstream. There are a whole bunch of duplicate places where the ParquetTableOptions are provided in the session. Which is why I'm not sure if I'm (a) still missing a place to set the config (even after changing all of these places), or (b) there is a bug in upstream where DF is still internally trying to use view types when we turned it off. 🤷🏼♀️ |
What I normally do in this case is go start making small PRs upstream to make the situation better
I likewise don't know -- can you give me a hint about how I can help you debug this (aka how do I reproduce the issue?) |
We have moved past this version. Closing. |
This takes us up to Oct-30 SHA 4975829
It advances DF by adding these oct-15 to oct-30 commits: 399e840...influxdata:arrow-datafusion:497582906e826d3045e2c7b9d4ddb44ce6d42adf
NOTE: this does not include:
Patches applied
Patches already fixed in future upstream commits:
Patches still not fixed:
Patches specific to our DF branch (no need for upstream fix):
bit_length(utf8view)
patch: 90423e2GroupedHashAggregateStream
: 50e1209tech debt to be fixed:
* I can guess this is due to our various iox codepoints where we map String fields to arrow's Utf8. I would rather not muck around with that, unless it's done separately.
* Additionally, setting
schema_force_view_types=false
feels fragile since it's nested in a various options. To be on the safe side, I would rather revert this commit for now.