Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: bump DataFusion to rev f4e519f #783

Merged
merged 5 commits into from
Aug 6, 2024
Merged

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Aug 5, 2024

Which issue does this PR close?

N/A

Rationale for this change

Weekly DataFusion revision bump. Here is the original PR

What changes are included in this PR?

Upgrade to DataFusion revision f4e519f9d , which includes these changes:

f4e519f9d Move min and max to user defined aggregate function, remove `AggregateFunction` / `AggregateFunctionDefinition::BuiltIn` (#11013)
9e90e17a6 fix: Add additional required expression for natural join (#11713)
81668f3b2 Support planning `Map` literal (#11780)
6c4c24612 Doc: Add Sail to known users list (#11791)
de2da34cc Add docs and rename param (#11778)
b89037f0d Add references to github issue (#11784)
0332eb569 refactor: move ExecutionPlan and related structs into dedicated mod (#11759)
5ca4ec3b5 Extract CoalesceBatchesStream to a struct (#11610)
80848f2a0 Fix #11692: Improve doc comments within macros (#11694)
df4e6cc4e [Minor] Short circuit `ApplyFunctionRewrites` if there are no function rewrites (#11765)
d010ce90f refactor(11523): update OOM message provided for a single failed reservation (#11771)
70aba2bd6 minor: always time batch_filter even when the result is an empty batch (#11775)
a0ad37684 [Minor] Refactor approx_percentile (#11769)
f044bc837 Fix bug that `COUNT(DISTINCT)` on StringView panics  (#11768)
0d98b9974 Minor: Add comment explaining rationale for hash check (#11750)
45b40c711 Minor: add "clickbench extended" queries to unit tests (#11763)
6e2ff2955 Derive Debug for logical plan nodes (#11757)
921c3b6b1 Add `TrackedMemoryPool` with better error messages on exhaustion (#11665)
3fe18604e Add null test (#11760)
a4ac0829e Fix documentation warnings, make CsvExecBuilder and Unparsed pub (#11729)
cf98d94c9 Use upstream `DataType::from_str` in arrow-cast (#11254)
1ce546168 Fix `plan_to_sql`: Add wildcard projection to SELECT statement if no projection was set (#11744)
0b8da6d6e Rename RepartitionExec metric `repart_time` to `repartition_time` (#11703)
4884c08ce Use upstream StatisticsConveter (#11479)
9dd2cfc80 Do not push down Sorts if it violates the sort requirements (#11678)
0f554fa12 Minor: Improve documentation for AggregateUDFImpl::state_fields (#11740)
ae2ca6a0e Support  cross-timezone `timestamp` comparison via coercsion (#11711)
fa50636c6 Implement physical plan serialization for parquet Copy plans (#11735)
2887491fc fix: set `null_equals_null` to false when `convert_cross_join_to_inner_join` (#11738)
89677ae66 Check hashes first during probing the aggr hash table (#11718)
abeb8b4f8 Make DefaultSchemaAdapterFactory public (#11709)
6508fa2dc expose some fields on session state (#11716)
929568d60 Use `cargo release` in benchmarks (#11722)
cc6416e74 Minor: Add example for `ScalarUDF::call` (#11727)
8ac50e2db Reduce repetition in try_process_group_by_unnest and try_process_unnest (#11714)
7ca7456d3 Handle nulls in approx_percentile_cont (#11721)
cd786e275 fix: regr_count now returns Uint64 (#11731)
66a85706f Rename `input_type` --> `input_types` on AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs (#11666)

How are these changes tested?

@huaxingao
Copy link
Contributor Author

@andygrove
Somehow TPC-DS Correctness / Run TPCDSQuerySuite (broadcast) failed.

- q97 *** FAILED *** (1 second, 268 milliseconds)
  java.lang.Exception: Expected "537833	285408	200", but got "538012	285422	200" Result did not match

@andygrove
Copy link
Member

@andygrove Somehow TPC-DS Correctness / Run TPCDSQuerySuite (broadcast) failed.

- q97 *** FAILED *** (1 second, 268 milliseconds)
  java.lang.Exception: Expected "537833	285408	200", but got "538012	285422	200" Result did not match

q97 (partial)

select  sum(case when ssci.customer_sk is not null and csci.customer_sk is null then 1 else 0 end) store_only
      ,sum(case when ssci.customer_sk is null and csci.customer_sk is not null then 1 else 0 end) catalog_only
      ,sum(case when ssci.customer_sk is not null and csci.customer_sk is not null then 1 else 0 end) store_and_catalog
from ssci full outer join csci on (ssci.customer_sk=csci.customer_sk
                               and ssci.item_sk = csci.item_sk)

Seems like this may be a regression in DataFusion. I did make a change to IsNotNull. I wonder if that is an issue.

@andygrove
Copy link
Member

The test passes for me locally, but consistently fails in CI

@andygrove
Copy link
Member

I can reproduce locally now

native/Cargo.toml Outdated Show resolved Hide resolved
@andygrove andygrove changed the title chore: bump DataFusion to rev c6f0d3c chore: bump DataFusion to rev f4e519f Aug 6, 2024
@andygrove
Copy link
Member

The regression is caused by apache/datafusion#11627

@andygrove andygrove requested a review from viirya August 6, 2024 19:48
@viirya
Copy link
Member

viirya commented Aug 6, 2024

Do we just upgrade to the commit before one that causes CI failure?

@andygrove
Copy link
Member

Do we just upgrade to the commit before one that causes CI failure?

Yes, I pushed a commit to just upgrade to the point where min/max API changed

@andygrove andygrove merged commit 6b64dcd into apache:main Aug 6, 2024
75 checks passed
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
* chore: bump DataFusion to rev c6f0d3c

* fix style

* Trigger Build

* Update native/Cargo.toml

---------

Co-authored-by: Andy Grove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants