-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: Introduce BalaceKeyrange scheduler #8497
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Calvin Neo <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8497 +/- ##
==========================================
- Coverage 76.25% 76.21% -0.04%
==========================================
Files 461 463 +2
Lines 70407 70905 +498
==========================================
+ Hits 53686 54039 +353
- Misses 13377 13487 +110
- Partials 3344 3379 +35
Flags with carried forward coverage won't be shown. Click here to find out more. |
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
/retest |
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
/retest |
@@ -284,6 +284,8 @@ func createRouter(prefix string, svr *server.Server) *mux.Router { | |||
registerFunc(clusterRouter, "/regions/split", regionsHandler.SplitRegions, setMethods(http.MethodPost), setAuditBackend(localLog, prometheus)) | |||
registerFunc(clusterRouter, "/regions/range-holes", regionsHandler.GetRangeHoles, setMethods(http.MethodGet), setAuditBackend(prometheus)) | |||
registerFunc(clusterRouter, "/regions/replicated", regionsHandler.CheckRegionsReplicated, setMethods(http.MethodGet), setQueries("startKey", "{startKey}", "endKey", "{endKey}"), setAuditBackend(prometheus)) | |||
registerFunc(clusterRouter, "/regions/balance-keyrange", regionsHandler.BalanceKeyrange, setMethods(http.MethodPost), setAuditBackend(localLog, prometheus)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we config it through the scheduler API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of configuration could be done to this scheduler? I think its behavior is already determined when creating, so we either wait it finished or timeout, or we can just delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to router/implement the new shceduler type in CreateScheduler with a uniform way.
StartTime time.Time | ||
} | ||
|
||
func computeCandidateStores(requiredLabels []*metapb.StoreLabel, stores []*metapb.Store, regions []*core.RegionInfo) []*storeRegionSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to decide who can be a candidate for each region according to the constraint. Not all regions share the same stores.
return &o | ||
} | ||
|
||
func buildMigrationPlan(stores []*storeRegionSet) ([]int, []int, []*migrationOp, int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to make store balance under constraint. expectedCount
is not accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I will adopt the region-wise strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it means we may generate operators that move some Region out of the given key range to keep the "store balance"?
expectedCount = append(expectedCount, avr) | ||
} | ||
|
||
senders := []int{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does sender and receiver mean and why do we need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some regions are scheduled out from sender to receiver. We can remove it from return value afterwards.
Peer *metapb.Peer | ||
} | ||
type migrationPlan struct { | ||
ErrorCode uint64 `json:"error_code"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is json used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not implemented yet, but I would like if to be marshaled as a string and printed into log, so we can see the detailed migration plan.
@@ -146,3 +146,10 @@ func (s *BaseScheduler) IsDefault() bool { | |||
} | |||
return false | |||
} | |||
|
|||
// IsFinished implements the Scheduler interface. | |||
func (*BaseScheduler) IsFinished() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's only used in test and log, please remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's first define the API interface of adding scheduler and checking scheduler status
@CalvinNeo @rleungx @joey-yez
// @Param end_key body string true "Regions end key, hex encoded" | ||
// @Param batch_size body string true "Maximum operators scheduled in one" | ||
// @Param timeout body string true "Timeout time in milliseconds" | ||
// @Param required_labels body array true "Only the stores with these labels would be scheduled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// @Param required_labels body array true "Only the stores with these labels would be scheduled" |
User only specify the key-range, this API would follow the <region,zone,host>
constraint for rebalancing the peer among stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe we can follow the pattern of scatter region
// @Param start_key body string true "Regions start key, hex encoded" | ||
// @Param end_key body string true "Regions end key, hex encoded" | ||
// @Param batch_size body string true "Maximum operators scheduled in one" | ||
// @Param timeout body string true "Timeout time in milliseconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using "seconds" instead of "milliseconds" is more reasonable for a timeout of scheduler.
What's more, there is a "timeout" param in the "/admin/unsafe/remove-failed-stores", which unit is "seconds".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// Include keyrange balancer | ||
OpKeyrange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the "OpRange" instead of adding "OpKeyrange"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for scatter range scheduler. And I have seen some predicates are based on opkind. In this view, I think different schedulers do not interfere with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operator kind's priority needs to be higher, it is currently positioned behind the OpRange.
Co-authored-by: JaySon <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
cb := func(s string) error { | ||
return controller.RemoveScheduler(s) | ||
} | ||
s, err := schedulers.CreateScheduler(types.BalanceKeyrangeScheduler, oc, storage.NewStorageWithMemoryBackend(), schedulers.ConfigSliceDecoder(types.BalanceKeyrangeScheduler, []string{data}), cb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PD may lose the config if the pd server restarts or transfers leader?
} | ||
|
||
// CheckBalanceKeyrangeStatus returns the status of the balance-keyrange scheduler. | ||
func (h *Handler) CheckBalanceKeyrangeStatus() (any, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about defining a response struct and not using any? and I think the caller may need the key range. .
// Include keyrange balancer | ||
OpKeyrange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operator kind's priority needs to be higher, it is currently positioned behind the OpRange.
@@ -418,6 +418,16 @@ func (oc *Controller) PromoteWaitingOperator() { | |||
} | |||
} | |||
|
|||
// GetWopCount returns the count in the waiting list of a kind of scheduler. | |||
func (oc *Controller) GetWopCount(kind string) uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using this function to check the concurrent operators ? s.OpController.OperatorCount
} | ||
|
||
// GetSchedulerMaxWaitingOperator returns the waiting queue side. | ||
func (oc *Controller) GetSchedulerMaxWaitingOperator() uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (oc *Controller) GetSchedulerMaxWaitingOperator() uint64 { | |
func (oc *Controller) GetSchedulerMaxWaitingOperatorLimit() uint64 { |
@@ -342,4 +346,6 @@ var ( | |||
transferWitnessLeaderCounter = transferWitnessLeaderCounterWithEvent("schedule") | |||
transferWitnessLeaderNewOperatorCounter = transferWitnessLeaderCounterWithEvent("new-operator") | |||
transferWitnessLeaderNoTargetStoreCounter = transferWitnessLeaderCounterWithEvent("no-target-store") | |||
|
|||
balanceKeyrangeScheduleCounter = balanceKeyrangeScheduleCounterWithEvent("schedule") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about adding new metrics for new-operator
?
type storeRegionSet struct { | ||
ID uint64 | ||
Info *metapb.Store | ||
// If the region still exists durgin migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what‘s the meaning about durgin
?
} | ||
|
||
oc := s.BaseScheduler.OpController | ||
rangeChanged := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to update the key range?
What problem does this PR solve?
Issue Number: Close #8881
What is changed and how does it work?
In this PR, we introduced a new scheduler balance-keyrange scheduler.
This scheduler takes several parameters:
It will then balance all regions within range [start_key, end_key) so that they can spread as evenly as possible within the range. Users could balance all regions within a table by choosing a proper start_key and end_key.
Currently only one start_key and one end_key are provided, so the scheduler can handle only one task at a time.
By specifying a certain JSON array called
required_labels
, the balance will take into account the stores whose store label is a superset ofrequired_labels
. For example, the followingrequired_labels
will select all TiFlash nodes.And if the
required_labels
is set to empty, then all stores will be included when balancing.A
batch_size
parameter is provided to control how many operators to be created during a schedule iteration at most. It is not encouraged(and also useless) to set it to a very big value, because there are some other rate limiters to prevent we from scheduling lots of operators at a time.Currently, this scheduler will not always run background. It will end and destroy itself under the following cases:
/scheduler/{name}
.We also expose an API to check the status of this scheduler.
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs
/pingcap/docs-cn
:pingcap/tiup
:Release note