-
Notifications
You must be signed in to change notification settings - Fork 1.5k
A complete solution for stable and safe sort with spill #14692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for filing this @zhuqi-lucas . I have also added it to our list here |
This is a reproducer for an external sort query failure under a very low memory limit #15028 |
I started collecting sort larger than RAM related things here: |
Can some provide a instruction for me to reproduce it and let me know which part of the code has the external sort algorithm? I can take a look at it. I probably can help if implementing a multi-pass external sort is the need, I've implemented multi-pass external sort before in python (just for fun). I hope it wouldn't be too hard to extend from current one-pass algorithm. |
If there is any trouble reproducing, you can ask in the bug-tracking issue.
Thank you. Just a heads-up, I think this issue is a bit tricky, we have to also look into the performance after implementing the algorithm. |
This issue looks interesting to me. I'll try to work on it! |
take |
Thanks @qstommyshu ! I think @2010YOUY01 has been working on this one most recently, so I would recommend checking in with them before coding too mich |
Sounds good, no problem. I didn't see another assignee for this issue, so I decided to take it . I won't start coding immediately, I'll check with @2010YOUY01 and think about the problem first, then start working on the code once I can reproduce the error and have a general idea of what the solution is. |
There is a bunch of work ongoing in this area, most of which is linked from |
This issue is receiving a lot of attention (tbh, users complaining the feature is not working), so I did a POC. Feel free to chime in @qstommyshu For major features, it’s always great to see alternative approaches to refine the work. |
Got it @2010YOUY01 , Since you are already working on this issue, I won't take it from you then. I'll try my best to provides some feedbacks to your PR (if there is any). |
Is your feature request related to a problem or challenge?
This is a follow-up for #14644:
See details in #14644 (comment) cc @alamb @Kontinuation
We need a complete solution about stable sort with spill, several issues until now:
The problem 1 could be solved by potentially spilling unsorted batches (and then sorting them separately). This would be less efficient (read/write some tuples twice but would work.
Problem 2 could use the multi-pass merge:
We can implement a more complicated multi-stage merging phase which merges a small portion of streams each time, and perform multiple rounds of merges to produce the final merged stream.
Describe the solution you'd like
The problem 1 could be solved by potentially spilling unsorted batches (and then sorting them separately). This would be less efficient (read/write some tuples twice but would work.
Problem 2 could use the multi-pass merge:
We can implement a more complicated multi-stage merging phase which merges a small portion of streams each time, and perform multiple rounds of merges to produce the final merged stream.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: