Skip to content

[EPIC] ClickBench Improvements (Vanity Benchmark) #14586

Open
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

The ClickBench Benchmark measures the performance of filtering and aggregation

Being on top of ClickBench is somewhat of a vanity benchmark, as in my opinion I think all the engines within a factor of 2 of likely have similar user experiences (and the exact speed will depends on real user queries, etc)

That being said, the engine at the top of the benchmark is certainly good for publicity and DataFusion has used it as (see see our blog here Apache DataFusion is now the fastest single node engine for querying Apache Parquet files)

So this ticket tracks improving the ClickBench peformance even more

Recently, as @Dandandan has pointed out on #14246 (comment), DuckDB slipped past us in the most recent results

Image

Describe the solution you'd like

Get DataFusion back on top

Describe alternatives you've considered

While we could clearly implement ClickBench specific optimizations, I don't think that is really a valuable exercise for users. I would very much like to focus our efforts on actually useful optimization

Some ideas of real improvements:

Potentially Benchmaxxing improvements

What I would like is of people profile queries and try and find ways to improve the queries

Additional context

See related discussions on

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions