-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flow based tablet load balancer #16351
Conversation
Signed-off-by: Michael Demmer <[email protected]> Co-authored-by: Venkatraju V <[email protected]>
Signed-off-by: Michael Demmer <[email protected]> Co-authored-by: Venkatraju V <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Michael Demmer <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16351 +/- ##
==========================================
- Coverage 68.98% 68.93% -0.05%
==========================================
Files 1562 1565 +3
Lines 200697 201473 +776
==========================================
+ Hits 138446 138886 +440
- Misses 62251 62587 +336 ☔ View full report in Codecov by Sentry. |
This looks great @demmer Just need to use |
Signed-off-by: Venkatraju V <[email protected]>
Signed-off-by: Michael Demmer <[email protected]>
@systay based on the test coverage reports, @venkatraju and I added some more tests for the balancer, and I beefed up some of the existing tests for the retry logic, including running both with and without the balancer. might be overkill but this is kinda a critical part of the code :) |
go/vt/vtgate/tabletgateway.go
Outdated
@@ -62,6 +70,9 @@ func init() { | |||
fs.StringVar(&CellsToWatch, "cells_to_watch", "", "comma-separated list of cells for watching tablets") | |||
fs.DurationVar(&initialTabletTimeout, "gateway_initial_tablet_timeout", 30*time.Second, "At startup, the tabletGateway will wait up to this duration to get at least one tablet per keyspace/shard/tablet type") | |||
fs.IntVar(&retryCount, "retry-count", 2, "retry count") | |||
fs.BoolVar(&balancerEnabled, "balancer_enabled", false, "Whether to enable the tablet balancer to evenly spread query load") | |||
fs.StringSliceVar(&balancerVtgateCells, "balancer_vtgate_cells", []string{}, "When in balanced mode, a comma-separated list of cells that contain vtgates (required)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs.StringSliceVar(&balancerVtgateCells, "balancer_vtgate_cells", []string{}, "When in balanced mode, a comma-separated list of cells that contain vtgates (required)") | |
fs.StringSliceVar(&balancerVtgateCells, "balancer-vtgate-cells", []string{}, "When in balanced mode, a comma-separated list of cells that contain vtgates (required)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything else is cosmetic, but this seems important.
What happens if you add vtgates to a cell that didn't previously have any? You would have to restart every vtgate with a new value for this flag?
It may not be very common in practice, but it should be documented somewhere.
Of course, the real answer is that vtgates should be given a list of all cells and discover which ones contain vtgates, but we've so far avoided adding vtgates to the topo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you add vtgates to a cell that didn't previously have any? You would have to restart every vtgate with a new value for this flag?
Yep. For the current algorithm to work as designed, it's required that all the vtgates know the fully set of cells that are running vtgates and that the load among these cells is roughly even. This allows each individual vtgate to make the assumption about what the others will do without needing signaling.
If you want to change this assumption therefore, you do have to restart the whole fleet to take it into account, just like any other vtgate flag change.
It may not be very common in practice, but it should be documented somewhere.
That makes sense. I'll include something to this effect when we write the website docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, the real answer is that vtgates should be given a list of all cells and discover which ones contain vtgates, but we've so far avoided adding vtgates to the topo.
Yeah - that's obviously beyond the scope of this PR, and it would in theory reduce the dependency that query inflows per cell need to be balanced (i.e. you could use the number of gates as a proxy for how much load you expect).
go/vt/vtgate/tabletgateway.go
Outdated
@@ -62,6 +70,9 @@ func init() { | |||
fs.StringVar(&CellsToWatch, "cells_to_watch", "", "comma-separated list of cells for watching tablets") | |||
fs.DurationVar(&initialTabletTimeout, "gateway_initial_tablet_timeout", 30*time.Second, "At startup, the tabletGateway will wait up to this duration to get at least one tablet per keyspace/shard/tablet type") | |||
fs.IntVar(&retryCount, "retry-count", 2, "retry count") | |||
fs.BoolVar(&balancerEnabled, "balancer_enabled", false, "Whether to enable the tablet balancer to evenly spread query load") | |||
fs.StringSliceVar(&balancerVtgateCells, "balancer_vtgate_cells", []string{}, "When in balanced mode, a comma-separated list of cells that contain vtgates (required)") | |||
fs.StringSliceVar(&balancerKeyspaces, "balancer_keyspaces", []string{}, "When in balanced mode, a comma-separated list of keyspaces for which to use the balancer (optional)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs.StringSliceVar(&balancerKeyspaces, "balancer_keyspaces", []string{}, "When in balanced mode, a comma-separated list of keyspaces for which to use the balancer (optional)") | |
fs.StringSliceVar(&balancerKeyspaces, "balancer-keyspaces", []string{}, "When in balanced mode, a comma-separated list of keyspaces for which to use the balancer (optional)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why was this added? So that you can opt in and out per keyspace as a phased rollout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly. That's how we did the rollout testing inside Slack -- started with a loadtest keyspace only and then moved to the one production keyspace where we saw the most imbalance, so we could focus on that without worrying about potential negative impact on the rest of the (thousands of) shards.
Signed-off-by: Michael Demmer <[email protected]>
Signed-off-by: Michael Demmer <[email protected]>
Signed-off-by: Michael Demmer <[email protected]>
Signed-off-by: Michael Demmer <[email protected]>
@venkatraju looks like this may need either a merge from main, or a simpler code change to pass linter checks. |
Signed-off-by: Venkatraju V <[email protected]>
ac6b050
to
b3fbaaa
Compare
@deepthi Fixed the log line that was triggering the linter error. |
I've added the label to block merging. This will be a nice thing to add to the release notes. |
b2f8253
to
3b889d6
Compare
Signed-off-by: Venkatraju V <[email protected]>
3b889d6
to
9aafb2a
Compare
Signed-off-by: Venkatraju V <[email protected]>
9aafb2a
to
5ae7720
Compare
Signed-off-by: Venkatraju <[email protected]>
@deepthi added release notes. Could you review the text, and also comment on whether you want the new flags called out there? (sorry about the messed up git history from a rebase that went wrong) |
This looks fine. Details on flags will be in the vtgate command reference (auto generated). |
Signed-off-by: Michael Demmer <[email protected]> Signed-off-by: Venkatraju V <[email protected]> Signed-off-by: Venkatraju <[email protected]> Co-authored-by: Venkatraju V <[email protected]> Co-authored-by: Venkatraju <[email protected]>
* flow based tablet load balancer (vitessio#16351) Signed-off-by: Michael Demmer <[email protected]> Signed-off-by: Venkatraju V <[email protected]>
Description
Implement an optional flow-based tablet balancer.
The motivation and algorithm is described more in #12241 and in a long comment in the code.
In brief, the goal of this implementation is to more evenly balance out the queries to a the replicas in a shard when the tablets are not evenly distributed across the various cells where vtgates are receiving application traffic, because the default routing algorithm will always prefer an available tablet in the same cell as the vtgate, even if there are more idle tablets in other cells. The balancer addresses this by attempting to route an equal share to each tablet, preferring the same-cell where possible, but crossing cells in order to achieve better global balance.
Slack production results
Note that the original algorithm was developed many months ago, but only recently have we made the time to deploy this within Slack and spent some time to verify it works as expected.
The results from our initial testing are very encouraging and we have been running with this feature enabled for some time now on one of our main production keyspaces. This is a 16 shard keyspace with between 9-11 tablets per shard spread among our 4 active cells. Each cell receives the same amount of inbound traffic.
When we first enabled the feature shows that the wide disparity in QPS per replica goes away and instead the QPS is only based on the number of replicas per shard, as expected:
Since then we have actually normalized our deployment so each shard has 9 replicas and the QPS is quite even across all hosts, even though again they do not map evenly to each cell:
Related Issue(s)
#12241
Checklist
Deployment Notes