-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Aggregation fuzzer framework #12667
Conversation
😍 😍 😍 😍 |
d1f5aea
to
6ad27f5
Compare
Rest works:
|
b059373
to
cf82eb7
Compare
@alamb I found the new fuzz cases(based on aggregation fuzzer) failed after I rebase main. The failed new case: I am more sure now, it may be due to the process of null, when I switch the Updated |
I found the reason and have submitted a pr to fix it #12758 |
Testing for the win! |
The tests' run time after modifying
|
@alamb I have added tests to cover more aggregations:
I make it through the simple implementation about randomly select sql memtioned in #12667 (comment)
Now the testcase may be like: // Build fuzzer
let fuzzer = builder
.data_gen_config(data_gen_config)
.data_gen_rounds(16)
.add_sql("SELECT sum(a) FROM fuzz_table")
.add_sql("SELECT sum(distinct a) FROM fuzz_table")
.add_sql("SELECT max(a) FROM fuzz_table")
.add_sql("SELECT min(a) FROM fuzz_table")
.add_sql("SELECT count(a) FROM fuzz_table")
.add_sql("SELECT count(distinct a) FROM fuzz_table")
.add_sql("SELECT avg(a) FROM fuzz_table")
.table_name("fuzz_table")
.build(); And unforunately, seems a bug about |
Yes, let's handle that as a separate PR / issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again @Rachelint -- I think this is great and I'll plan to merge it tomorrow unless anyone else would like time to review
Then we can work on follow on items in other PRs
} | ||
|
||
/// Fuzz test for group by `sting + int64` | ||
#[tokio::test(flavor = "multi_thread")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant like
#[tokio::test]
async fn test_basic_string_aggr_group_by_mixed_string_int64_1() {
...
}
#[tokio::test]
async fn test_basic_string_aggr_group_by_mixed_string_int64_2() {
...
}
Rather than a single test that was multi-threaded
Thanks again @Rachelint -- this is so great |
I am working on filing the error that this found for min/max |
I also have some ideas on how to improve the error reporting here |
Nice, the error reporting is indeed not so readable currently |
Maybe #5131 will help? The error seems similar. |
#12832 has a proposal for improved error reporting |
let fuzzer = builder | ||
.data_gen_config(data_gen_config) | ||
.data_gen_rounds(8) | ||
// FIXME: Encounter error in min/max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed fix in #12834
BTW this found another bug in #12792 See https://github.com/apache/datafusion/actions/runs/11272685143/job/31348663515?pr=12792 ❤️ |
Which issue does this PR close?
Part of #12114
Rationale for this change
The aggregation fuzz tests maybe not enough, becasue there are so many optimizations in aggregation in datafusion.
For improving the testing, the first thing to do may be building a framework to make creating testcase easier.
This pr aimed to introduce such a framework.
What changes are included in this PR?
Design an
AggregationFuzzer
to make creating testcases for aggregation easier.Two key components in
AggregationFuzzer
areDatasetGenerator
andSessionContextGenerator
.In
AggregationFuzzer
(with a deinfedquery
represented by sql currently):Multiple
dataset
s will be generated byDatasetGenerator
firstly.And then for a specific
dataset
, multipleSessionContext
with the random params about aggreagtion, and a baselineSessionContext
will be generated.Still for the specific
dataset
, each randomly generatedSessionContext
+basline result
(generated fromdataset
+ baslineSessionContext
) will be used to buildAggregationFuzzTestTask
.In
AggregationFuzzTestTask
, we will run thequery
withdataset
+ relatedSessionContext
, then we compare the result withbasline result
. If inconsistency found, report it and panic.Finally, all
AggregationFuzzTestTask
s will be run in parallel inAggregationFuzzer
Are these changes tested?
Test by uts.
Are there any user-facing changes?
No,