[BugFix] fix broker load job hang when meet resource group pending timeout (backport #51072) #51125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
In previous PR #38183, we will skip cancel loading job when timeout happen. But when broker load met resource group pending timeout like:
It's not a loading job timeout, but because error message contains
timeout
, so we will consider it as normal timeout and won't cancel it immediately, so load job will hang until timeout happens.What I'm doing:
When we met
resource group pending timeout
, we can set it asUSER_CANCEL
, so we can cancel it immediately.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request #51072 done by [Mergify](https://mergify.com). ## Why I'm doing: In previous PR #38183, we will skip cancel loading job when timeout happen. But when broker load met resource group pending timeout like: ``` com.starrocks.common.UserException: Failed to allocate resource to query: pending timeout [300], you could modify the session variable [query_queue_pending_timeout_second] to pending more time at com.starrocks.qe.QueryQueueManager.maybeWait(QueryQueueManager.java:81) ~[starrocks-fe.jar:?] at com.starrocks.qe.DefaultCoordinator.startScheduling(DefaultCoordinator.java:488) ~[starrocks-fe.jar:?] at com.starrocks.qe.scheduler.Coordinator.startScheduling(Coordinator.java:102) ~[starrocks-fe.jar:?] at com.starrocks.qe.scheduler.Coordinator.exec(Coordinator.java:85) ~[starrocks-fe.jar:?] ```
It's not a loading job timeout, but because error message contains
timeout
, so we will consider it as normal timeout and won't cancel it immediately, so load job will hang until timeout happens.What I'm doing:
When we met
resource group pending timeout
, we can set it asUSER_CANCEL
, so we can cancel it immediately.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist: