Description
Is your feature request related to a problem or challenge?
The ClickBench Benchmark measures the performance of filtering and aggregation
Being on top of ClickBench is somewhat of a vanity benchmark, as in my opinion I think all the engines within a factor of 2 of likely have similar user experiences (and the exact speed will depends on real user queries, etc)
That being said, the engine at the top of the benchmark is certainly good for publicity and DataFusion has used it as (see see our blog here Apache DataFusion is now the fastest single node engine for querying Apache Parquet files)
So this ticket tracks improving the ClickBench peformance even more
Recently, as @Dandandan has pointed out on #14246 (comment), DuckDB slipped past us in the most recent results

Describe the solution you'd like
Get DataFusion back on top
Describe alternatives you've considered
While we could clearly implement ClickBench specific optimizations, I don't think that is really a valuable exercise for users. I would very much like to focus our efforts on actually useful optimization
Some ideas of real improvements:
- Enable parquet filter pushdown by default #3463
- Make ClickBench Q23 Go Faster #15177
- Improve parquet ListingTable speed with parquet metadata (short clickbench queries) #11719
- POC: Eliminate unnecessary group by keys (q35 in clickbench 1.35x faster) #13617
Potentially Benchmaxxing improvements
What I would like is of people profile queries and try and find ways to improve the queries
Additional context
See related discussions on