Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: Introduce BalaceKeyrange scheduler #8497

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Aug 6, 2024

What problem does this PR solve?

Issue Number: Close #8881

What is changed and how does it work?

In this PR, we introduced a new scheduler balance-keyrange scheduler.

This scheduler takes several parameters:

  • start_key (hex format string)
  • end_key (hex format string)
  • batch_size
  • required_labels
  • timeout

It will then balance all regions within range [start_key, end_key) so that they can spread as evenly as possible within the range. Users could balance all regions within a table by choosing a proper start_key and end_key.

Currently only one start_key and one end_key are provided, so the scheduler can handle only one task at a time.

By specifying a certain JSON array called required_labels, the balance will take into account the stores whose store label is a superset of required_labels. For example, the following required_labels will select all TiFlash nodes.

[{"key":"engine","value":"tiflash"}]

And if the required_labels is set to empty, then all stores will be included when balancing.

A batch_size parameter is provided to control how many operators to be created during a schedule iteration at most. It is not encouraged(and also useless) to set it to a very big value, because there are some other rate limiters to prevent we from scheduling lots of operators at a time.

Currently, this scheduler will not always run background. It will end and destroy itself under the following cases:

  • Its work is finished, which means all operators are scheduled at least once. The scheduler will retry when a operator is not scheduled or cancelled due to rate limit reason. However, it will not retry on other occasions.
  • A new schedule command comes.
  • The scheduler timeout.
  • After pd restarts, it will no longer continue scheduling, we have to re-trigger the schedule.
  • The scheduler is deleted by DELETE /scheduler/{name}.

We also expose an API to check the status of this scheduler.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

a
Signed-off-by: Calvin Neo <[email protected]>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Aug 6, 2024
Copy link
Contributor

ti-chi-bot bot commented Aug 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rleungx for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 6, 2024
Signed-off-by: Calvin Neo <[email protected]>
a
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 21, 2024
Signed-off-by: Calvin Neo <[email protected]>
Copy link

codecov bot commented Aug 21, 2024

Codecov Report

Attention: Patch coverage is 71.36659% with 132 lines in your changes missing coverage. Please review.

Project coverage is 76.21%. Comparing base (2d970a6) to head (f38d356).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8497      +/-   ##
==========================================
- Coverage   76.25%   76.21%   -0.04%     
==========================================
  Files         461      463       +2     
  Lines       70407    70905     +498     
==========================================
+ Hits        53686    54039     +353     
- Misses      13377    13487     +110     
- Partials     3344     3379      +35     
Flag Coverage Δ
unittests 76.21% <71.36%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 13, 2024
Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo CalvinNeo changed the title DNM Support BalaceRegion DNM Support BalaceKeyrange scheduler Nov 20, 2024
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo CalvinNeo changed the title DNM Support BalaceKeyrange scheduler DNM Introduce BalaceKeyrange scheduler Dec 6, 2024
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo
Copy link
Member Author

/retest

Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo CalvinNeo changed the title DNM Introduce BalaceKeyrange scheduler Introduce BalaceKeyrange scheduler Dec 20, 2024
@CalvinNeo CalvinNeo changed the title Introduce BalaceKeyrange scheduler scheduler: Introduce BalaceKeyrange scheduler Dec 20, 2024
@CalvinNeo
Copy link
Member Author

/retest

@@ -284,6 +284,8 @@ func createRouter(prefix string, svr *server.Server) *mux.Router {
registerFunc(clusterRouter, "/regions/split", regionsHandler.SplitRegions, setMethods(http.MethodPost), setAuditBackend(localLog, prometheus))
registerFunc(clusterRouter, "/regions/range-holes", regionsHandler.GetRangeHoles, setMethods(http.MethodGet), setAuditBackend(prometheus))
registerFunc(clusterRouter, "/regions/replicated", regionsHandler.CheckRegionsReplicated, setMethods(http.MethodGet), setQueries("startKey", "{startKey}", "endKey", "{endKey}"), setAuditBackend(prometheus))
registerFunc(clusterRouter, "/regions/balance-keyrange", regionsHandler.BalanceKeyrange, setMethods(http.MethodPost), setAuditBackend(localLog, prometheus))
Copy link
Member

@rleungx rleungx Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we config it through the scheduler API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of configuration could be done to this scheduler? I think its behavior is already determined when creating, so we either wait it finished or timeout, or we can just delete it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to router/implement the new shceduler type in CreateScheduler with a uniform way.

@CalvinNeo CalvinNeo requested a review from rleungx December 23, 2024 08:55
StartTime time.Time
}

func computeCandidateStores(requiredLabels []*metapb.StoreLabel, stores []*metapb.Store, regions []*core.RegionInfo) []*storeRegionSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide who can be a candidate for each region according to the constraint. Not all regions share the same stores.

return &o
}

func buildMigrationPlan(stores []*storeRegionSet) ([]int, []int, []*migrationOp, int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to make store balance under constraint. expectedCount is not accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will adopt the region-wise strategy.

Copy link
Contributor

@JaySon-Huang JaySon-Huang Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it means we may generate operators that move some Region out of the given key range to keep the "store balance"?

expectedCount = append(expectedCount, avr)
}

senders := []int{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does sender and receiver mean and why do we need them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some regions are scheduled out from sender to receiver. We can remove it from return value afterwards.

Peer *metapb.Peer
}
type migrationPlan struct {
ErrorCode uint64 `json:"error_code"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is json used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not implemented yet, but I would like if to be marshaled as a string and printed into log, so we can see the detailed migration plan.

@@ -146,3 +146,10 @@ func (s *BaseScheduler) IsDefault() bool {
}
return false
}

// IsFinished implements the Scheduler interface.
func (*BaseScheduler) IsFinished() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only used in test and log, please remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first define the API interface of adding scheduler and checking scheduler status
@CalvinNeo @rleungx @joey-yez

server/api/region.go Outdated Show resolved Hide resolved
// @Param end_key body string true "Regions end key, hex encoded"
// @Param batch_size body string true "Maximum operators scheduled in one"
// @Param timeout body string true "Timeout time in milliseconds"
// @Param required_labels body array true "Only the stores with these labels would be scheduled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// @Param required_labels body array true "Only the stores with these labels would be scheduled"

User only specify the key-range, this API would follow the <region,zone,host> constraint for rebalancing the peer among stores

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe we can follow the pattern of scatter region

// @Param start_key body string true "Regions start key, hex encoded"
// @Param end_key body string true "Regions end key, hex encoded"
// @Param batch_size body string true "Maximum operators scheduled in one"
// @Param timeout body string true "Timeout time in milliseconds"
Copy link
Contributor

@JaySon-Huang JaySon-Huang Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using "seconds" instead of "milliseconds" is more reasonable for a timeout of scheduler.
What's more, there is a "timeout" param in the "/admin/unsafe/remove-failed-stores", which unit is "seconds".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +52 to +53
// Include keyrange balancer
OpKeyrange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the "OpRange" instead of adding "OpKeyrange"?

Copy link
Member Author

@CalvinNeo CalvinNeo Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for scatter range scheduler. And I have seen some predicates are based on opkind. In this view, I think different schedulers do not interfere with each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operator kind's priority needs to be higher, it is currently positioned behind the OpRange.

cb := func(s string) error {
return controller.RemoveScheduler(s)
}
s, err := schedulers.CreateScheduler(types.BalanceKeyrangeScheduler, oc, storage.NewStorageWithMemoryBackend(), schedulers.ConfigSliceDecoder(types.BalanceKeyrangeScheduler, []string{data}), cb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PD may lose the config if the pd server restarts or transfers leader?

}

// CheckBalanceKeyrangeStatus returns the status of the balance-keyrange scheduler.
func (h *Handler) CheckBalanceKeyrangeStatus() (any, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about defining a response struct and not using any? and I think the caller may need the key range. .

Comment on lines +52 to +53
// Include keyrange balancer
OpKeyrange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operator kind's priority needs to be higher, it is currently positioned behind the OpRange.

@@ -418,6 +418,16 @@ func (oc *Controller) PromoteWaitingOperator() {
}
}

// GetWopCount returns the count in the waiting list of a kind of scheduler.
func (oc *Controller) GetWopCount(kind string) uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using this function to check the concurrent operators ? s.OpController.OperatorCount

}

// GetSchedulerMaxWaitingOperator returns the waiting queue side.
func (oc *Controller) GetSchedulerMaxWaitingOperator() uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (oc *Controller) GetSchedulerMaxWaitingOperator() uint64 {
func (oc *Controller) GetSchedulerMaxWaitingOperatorLimit() uint64 {

@@ -342,4 +346,6 @@ var (
transferWitnessLeaderCounter = transferWitnessLeaderCounterWithEvent("schedule")
transferWitnessLeaderNewOperatorCounter = transferWitnessLeaderCounterWithEvent("new-operator")
transferWitnessLeaderNoTargetStoreCounter = transferWitnessLeaderCounterWithEvent("no-target-store")

balanceKeyrangeScheduleCounter = balanceKeyrangeScheduleCounterWithEvent("schedule")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding new metrics for new-operator?

type storeRegionSet struct {
ID uint64
Info *metapb.Store
// If the region still exists durgin migration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what‘s the meaning about durgin?

}

oc := s.BaseScheduler.OpController
rangeChanged := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to update the key range?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Balance all regions within a certain key range
5 participants