Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](planner) query should be cancelled if limit reached (#44338) #45223

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

morningman
Copy link
Contributor

bp #44338

Problem Summary:
When there is a `limit` cluse in SQL, if FE has obtained data with more
than the `limit` number of rows,
it should send a cancel command to BE to cancel the query to prevent BE
from reading more data.
However, this function has problems in the current code and does not
work.
Especially in external table query, this may result in lots of
unnecessary network io read.

1. `isBlockQuery`

In the old optimizer, if a query statement contains a `sort` or `agg`
node,
    `isBlockQuery` will be marked as true, otherwise it will be false.
    In the new optimizer, this value is always true.

    Regardless of the old or new optimizer, this logic is wrong.
But only when `isBlockQuery = false` will the reach limit logic be
triggered.

2. Calling problem of reach limit logic

The reach limit logic judgment will only be performed when `eos = true`
in the rowBatch returned by BE.
    This is wrong.
Because for `limit N` queries, each BE's own `limit` is N. But for FE,
as long as the total number of rows
returned by all BEs exceeds N, the reach limit logic can be triggered.
    So it should not be processed only when `eos = true`.

The PR mainly changes:

1. Remove `isBlockQuery`

`isBlockQuery` is only used in the reach limit logic. And it is not
needed. Remove it completely.

2. Modify the judgment position of reach limit.

    When the number of rows obtained by FE is greater than the limit,
    it will check the reach limit logic.

3. fix wrong `limitRows` in `QueryProcessor`

    the limitRows should be got from the first fragment, not last.

4. In scanner scheduler on BE side, if scanner has limit, ignore the
scan bytes threshold per round.

[fix](planner) query should be cancelled if limit reached
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40521 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 488e7d7e05d3c113c67a032b8f0ec2fcab481b54, data reload: false

------ Round 1 ----------------------------------
q1	17563	7488	7334	7334
q2	2055	174	188	174
q3	11024	1512	1159	1159
q4	10568	769	708	708
q5	7790	2777	2787	2777
q6	242	150	149	149
q7	983	615	611	611
q8	9589	1968	1979	1968
q9	8129	6357	6426	6357
q10	7013	2263	2337	2263
q11	465	279	251	251
q12	416	217	208	208
q13	18018	2947	2965	2947
q14	248	204	204	204
q15	557	520	522	520
q16	696	615	603	603
q17	964	553	616	553
q18	7279	6518	6571	6518
q19	2036	1061	952	952
q20	488	214	194	194
q21	4000	3106	3153	3106
q22	1075	965	973	965
Total cold run time: 111198 ms
Total hot run time: 40521 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7314	7212	7205	7205
q2	320	232	222	222
q3	3050	2848	2834	2834
q4	1999	1756	1754	1754
q5	5643	5692	5716	5692
q6	225	147	151	147
q7	2213	1808	1748	1748
q8	3333	3541	3381	3381
q9	8851	8833	8807	8807
q10	3538	3507	3516	3507
q11	594	503	494	494
q12	831	644	593	593
q13	16394	3173	3140	3140
q14	299	265	276	265
q15	571	529	526	526
q16	706	653	675	653
q17	1848	1628	1564	1564
q18	8172	7860	7313	7313
q19	7592	1624	1446	1446
q20	2099	1851	1812	1812
q21	5414	5246	5283	5246
q22	1103	994	1028	994
Total cold run time: 82109 ms
Total hot run time: 59343 ms

@morningman morningman merged commit 49d1671 into apache:branch-3.0 Dec 10, 2024
21 of 24 checks passed
@qidaye
Copy link
Contributor

qidaye commented Dec 24, 2024

run p1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants