Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](planner) query should be cancelled if limit reached #44338

Merged
merged 3 commits into from
Dec 10, 2024

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Nov 20, 2024

What problem does this PR solve?

Problem Summary:
When there is a limit cluse in SQL, if FE has obtained data with more than the limit number of rows,
it should send a cancel command to BE to cancel the query to prevent BE from reading more data.
However, this function has problems in the current code and does not work.
Especially in external table query, this may result in lots of unnecessary network io read.

  1. isBlockQuery

    In the old optimizer, if a query statement contains a sort or agg node,
    isBlockQuery will be marked as true, otherwise it will be false.
    In the new optimizer, this value is always true.

    Regardless of the old or new optimizer, this logic is wrong.
    But only when isBlockQuery = false will the reach limit logic be triggered.

  2. Calling problem of reach limit logic

    The reach limit logic judgment will only be performed when eos = true in the rowBatch returned by BE.
    This is wrong.
    Because for limit N queries, each BE's own limit is N. But for FE, as long as the total number of rows
    returned by all BEs exceeds N, the reach limit logic can be triggered.
    So it should not be processed only when eos = true.

The PR mainly changes:

  1. Remove isBlockQuery

    isBlockQuery is only used in the reach limit logic. And it is not needed. Remove it completely.

  2. Modify the judgment position of reach limit.

    When the number of rows obtained by FE is greater than the limit,
    it will check the reach limit logic.

  3. fix wrong limitRows in QueryProcessor

    the limitRows should be got from the first fragment, not last.

  4. In scanner scheduler on BE side, if scanner has limit, ignore the scan bytes threshold per round.

Release note

fix query should be cancelled if limit reached

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test
      test single and multi-be nodes with limit sql.
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman marked this pull request as draft November 20, 2024 08:41
@morningman morningman marked this pull request as ready for review November 20, 2024 10:52
@morningman
Copy link
Contributor Author

run buildall

924060929
924060929 previously approved these changes Nov 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 20, 2024
@morningman
Copy link
Contributor Author

run buildall

1 similar comment
@morningman
Copy link
Contributor Author

run buildall

924060929
924060929 previously approved these changes Nov 22, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2024
kaka11chen
kaka11chen previously approved these changes Nov 29, 2024
Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman dismissed stale reviews from kaka11chen and 924060929 via d12d3f9 December 4, 2024 06:58
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 4, 2024
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.46% (10006/26014)
Line Coverage: 29.52% (83909/284254)
Region Coverage: 28.62% (43113/150662)
Branch Coverage: 25.21% (21914/86918)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d12d3f9c1de7196760196e04aa2ec57f9bc16cfd_d12d3f9c1de7196760196e04aa2ec57f9bc16cfd/report/index.html

@morningman
Copy link
Contributor Author

run buildall

924060929
924060929 previously approved these changes Dec 5, 2024
@@ -316,6 +318,17 @@ void ScannerScheduler::_scanner_scan(std::shared_ptr<ScannerContext> ctx,
ctx->inc_block_usage(free_block->allocated_bytes());
scan_task->cached_blocks.emplace_back(std::move(free_block), free_block_bytes);
}
if (limit < ctx->batch_size()) {
Copy link
Contributor

@HappenLee HappenLee Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if limit == -1? not limit query will be true. better the check in while loop ?

@morningman morningman dismissed stale reviews from 924060929 and morrySnow via 670d90e December 5, 2024 17:40
@morningman
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 5, 2024
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.48% (10007/26004)
Line Coverage: 29.52% (83926/284308)
Region Coverage: 28.62% (43134/150700)
Branch Coverage: 25.21% (21926/86958)
Coverage Report: http://coverage.selectdb-in.cc/coverage/670d90eb0b356074f8f23293671f7bee4a30547b_670d90eb0b356074f8f23293671f7bee4a30547b/report/index.html

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 6, 2024
Copy link
Contributor

github-actions bot commented Dec 6, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 5aee0cc into apache:master Dec 10, 2024
25 of 28 checks passed
morningman added a commit to morningman/doris that referenced this pull request Dec 10, 2024
Problem Summary:
When there is a `limit` cluse in SQL, if FE has obtained data with more
than the `limit` number of rows,
it should send a cancel command to BE to cancel the query to prevent BE
from reading more data.
However, this function has problems in the current code and does not
work.
Especially in external table query, this may result in lots of
unnecessary network io read.

1. `isBlockQuery`

In the old optimizer, if a query statement contains a `sort` or `agg`
node,
    `isBlockQuery` will be marked as true, otherwise it will be false.
    In the new optimizer, this value is always true.

    Regardless of the old or new optimizer, this logic is wrong.
But only when `isBlockQuery = false` will the reach limit logic be
triggered.

2. Calling problem of reach limit logic

The reach limit logic judgment will only be performed when `eos = true`
in the rowBatch returned by BE.
    This is wrong.
Because for `limit N` queries, each BE's own `limit` is N. But for FE,
as long as the total number of rows
returned by all BEs exceeds N, the reach limit logic can be triggered.
    So it should not be processed only when `eos = true`.

The PR mainly changes:

1. Remove `isBlockQuery`

`isBlockQuery` is only used in the reach limit logic. And it is not
needed. Remove it completely.

2. Modify the judgment position of reach limit.

    When the number of rows obtained by FE is greater than the limit,
    it will check the reach limit logic.

3. fix wrong `limitRows` in `QueryProcessor`

    the limitRows should be got from the first fragment, not last.

4. In scanner scheduler on BE side, if scanner has limit, ignore the
scan bytes threshold per round.

[fix](planner) query should be cancelled if limit reached
morningman added a commit to morningman/doris that referenced this pull request Dec 10, 2024
Problem Summary:
When there is a `limit` cluse in SQL, if FE has obtained data with more
than the `limit` number of rows,
it should send a cancel command to BE to cancel the query to prevent BE
from reading more data.
However, this function has problems in the current code and does not
work.
Especially in external table query, this may result in lots of
unnecessary network io read.

1. `isBlockQuery`

In the old optimizer, if a query statement contains a `sort` or `agg`
node,
    `isBlockQuery` will be marked as true, otherwise it will be false.
    In the new optimizer, this value is always true.

    Regardless of the old or new optimizer, this logic is wrong.
But only when `isBlockQuery = false` will the reach limit logic be
triggered.

2. Calling problem of reach limit logic

The reach limit logic judgment will only be performed when `eos = true`
in the rowBatch returned by BE.
    This is wrong.
Because for `limit N` queries, each BE's own `limit` is N. But for FE,
as long as the total number of rows
returned by all BEs exceeds N, the reach limit logic can be triggered.
    So it should not be processed only when `eos = true`.

The PR mainly changes:

1. Remove `isBlockQuery`

`isBlockQuery` is only used in the reach limit logic. And it is not
needed. Remove it completely.

2. Modify the judgment position of reach limit.

    When the number of rows obtained by FE is greater than the limit,
    it will check the reach limit logic.

3. fix wrong `limitRows` in `QueryProcessor`

    the limitRows should be got from the first fragment, not last.

4. In scanner scheduler on BE side, if scanner has limit, ignore the
scan bytes threshold per round.

[fix](planner) query should be cancelled if limit reached
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants