-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](planner) query should be cancelled if limit reached (#44338) #45223
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem Summary: When there is a `limit` cluse in SQL, if FE has obtained data with more than the `limit` number of rows, it should send a cancel command to BE to cancel the query to prevent BE from reading more data. However, this function has problems in the current code and does not work. Especially in external table query, this may result in lots of unnecessary network io read. 1. `isBlockQuery` In the old optimizer, if a query statement contains a `sort` or `agg` node, `isBlockQuery` will be marked as true, otherwise it will be false. In the new optimizer, this value is always true. Regardless of the old or new optimizer, this logic is wrong. But only when `isBlockQuery = false` will the reach limit logic be triggered. 2. Calling problem of reach limit logic The reach limit logic judgment will only be performed when `eos = true` in the rowBatch returned by BE. This is wrong. Because for `limit N` queries, each BE's own `limit` is N. But for FE, as long as the total number of rows returned by all BEs exceeds N, the reach limit logic can be triggered. So it should not be processed only when `eos = true`. The PR mainly changes: 1. Remove `isBlockQuery` `isBlockQuery` is only used in the reach limit logic. And it is not needed. Remove it completely. 2. Modify the judgment position of reach limit. When the number of rows obtained by FE is greater than the limit, it will check the reach limit logic. 3. fix wrong `limitRows` in `QueryProcessor` the limitRows should be got from the first fragment, not last. 4. In scanner scheduler on BE side, if scanner has limit, ignore the scan bytes threshold per round. [fix](planner) query should be cancelled if limit reached
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 40521 ms
|
run p1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
bp #44338