Skip to content

Benchmark / program to test Spilling Sorts #15664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #15271
alamb opened this issue Apr 9, 2025 · 2 comments
Open
Tracked by #15271

Benchmark / program to test Spilling Sorts #15664

alamb opened this issue Apr 9, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Apr 9, 2025

Is your feature request related to a problem or challenge?

There are many interesting ideas on how to improve DataFusion while spilling for example #15271 from @2010YOUY01 and others.

What I think we really need next to make progress in this area is a benchmark / agreed upon way of measuring our progress so that we can improve and

Describe the solution you'd like

I would like a documented command / set of commands that is:

  1. Easy to run (and thus fast to test / iterate on)
  2. Exercises the spilling feature at different levels of memory pressure
  3. Spends most of its time sorting/spilling/merging (not generating output for example)

Describe alternatives you've considered

idea 1: can use some datafusion-cli features / flags and document them

Idea 2: Add a new suite to bench.sh / dfbench: https://github.com/apache/datafusion/tree/main/benchmarks

As for what to do I suggest something relatively simple like sorting the TPCH lineitem table with 200MB, 500MB, 1GB, 5GB and 10GB of memory for example

Additional context

No response

@alamb alamb added the enhancement New feature or request label Apr 9, 2025
@alamb alamb changed the title Benchmark / program to test Spilling Joins Benchmark / program to test Spilling Sorts Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant