Skip to content

Even more improvements in memory utilization HashAggregationExec when spilling #8428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
milenkovicm opened this issue Dec 5, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@milenkovicm
Copy link
Contributor

Is your feature request related to a problem or challenge?

This task is related to #7858, to follow up with some issues we've found with HashAggregationExec.

Specific issue we found is that during spill HashAggregationExec will sort and spill aggregation buffer, while sorting it will allocate buffer as big as current aggregation state

https://github.com/apache/arrow-datafusion/blob/7acd8833cc5d03ba7643d4ae424553c7681ccce8/datafusion/physical-plan/src/aggregates/row_hash.rs#L672

This will make operator using (twice) more memory than already allocated by the memory manager.

We need to find a solution which would respect allocated memory limit

Some ides can be find in the #7858, more specific in comments:

but we are open for other ideas as well

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@milenkovicm milenkovicm added the enhancement New feature or request label Dec 5, 2023
@alamb alamb changed the title Improvements in memory utilization HashAggregationExec when spilling Even more improvements in memory utilization HashAggregationExec when spilling Dec 5, 2023
@milenkovicm milenkovicm self-assigned this Mar 19, 2025
@milenkovicm milenkovicm removed their assignment Mar 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant