-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tablet throttler: read and use MySQL host metrics #16904
base: main
Are you sure you want to change the base?
Tablet throttler: read and use MySQL host metrics #16904
Conversation
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…MacOS, and MacOS is beign used in local dev env unit and endtoend testing Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16904 +/- ##
==========================================
- Coverage 67.43% 67.38% -0.06%
==========================================
Files 1571 1573 +2
Lines 252249 252980 +731
==========================================
+ Hits 170098 170464 +366
- Misses 82151 82516 +365 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Shlomi Noach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some nits and questions. Can we also please add an e2e test (sorry if I'm missing it). The testing feels a bit light (and is reflected in the codecov results) and this seems like something we'd want to test through the RPC chain (to a fake and/or real mysqlctld).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do these changes relate to the PR? Is it that we're getting rid of the old loadavg
OS metric now that we have the new mysql-loadavg
metric and so we wanted to test another MySQL process metric here?
Is there any reason that we're not using any of the new MySQL host metrics in the e2e tests? Or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do these changes relate to the PR?
See #16904 (comment): it's about tests failing on MacOS machines.
Is there any reason that we're not using any of the new MySQL host metrics in the e2e tests? Or am I missing something here?
You're not missing anything here. Let me see what we can do for e2e tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added endtoend
tests that verify the full MysqlHostMetrics
path don to mysqld, and checking response values. We check the existence of both new mysqld-*
values, and we specifically check that mysqld-datadir-used-ratio
is non-zero (as can be expected in local hosts and in CI). We do not check the value of mysqld-loadavg
as that could be arbitrarily low and even appear to be zero, as well as because we do not implement it in MacOS.
go/vt/vttablet/tabletserver/throttle/base/self_metric_loadavg.go
Outdated
Show resolved
Hide resolved
} | ||
|
||
func (m *MysqldLoadAvgSelfMetric) DefaultScope() Scope { | ||
return SelfScope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the type and variable exported makes this interface feel off to me. Since anyone can access the metric and this internal (?) variable as things are now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet no one holds an instance of MysqldLoadAvgSelfMetric
. The only instance in existence is:
var _ SelfMetric = registerSelfMetric(&MysqldLoadAvgSelfMetric{})
So yes, anyone can create their own new instance, but that cannot affect the existing instances.
go/vt/vttablet/tabletserver/throttle/base/self_metric_mysqld_test.go
Outdated
Show resolved
Hide resolved
} | ||
assert.EqualValues(t, 10, val) | ||
cancel() | ||
// There can be a race condition where the rate limiter still emits one final tick after the context is cancelled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, we should be able to drain the ticker's channel on cancel if it's not empty. That would be ideal, unless I'm missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The race condition is a bit more nuanced:
go func() {
defer mysqlHostMetricsRateLimiter.Store(nil)
defer rateLimiter.Stop()
<-ctx.Done()
}()
so when you hit cancel()
, the rate limiter gets unblocked from <-ctx.Done()
, but then it's still possible to get a new ticker event before defer rateLimiter.Stop()
gets issued.
go/vt/vttablet/tabletserver/throttle/base/self_metric_mysqld_test.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…nce of mysqld-* throttler values, and for nonzero value of mysqld-datadir-used-ratio Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…rAggregatedShardMysqldDatadirUsedRatio Signed-off-by: Shlomi Noach <[email protected]>
Docs: vitessio/website#1878 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! ❤️
go/vt/vttablet/tabletserver/throttle/base/self_metric_loadavg.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Now supporting This is used in both tablet's |
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Description
In #16850 we added the necessary proto/gRPC calls to get hsot metrics from the MySQL daemon. #16850 is part of the v21 release.
This PR complements #16850 by having the tablet throttler read and use said MySQL host metrics. These will henceforth be available to the user by the names:
mysqld-loadavg
(default threshold:1.0
)mysqld-datadir-used-ratio
(range0.0
== empty, to1.0
== full ; default threshold:0.98
)The MySQL host metrics stand out in that the throttler only reads them sporadically. It would be too expensive to call a sequence of two gRPC (one to tablet manager, then to mysql daemon) multiple times a second. Certainly for a metric such as
mysqld-datadir-used-ratio
, but also formysqld-loadavg
. The throttler only actually reads those metrics once per10sec
, and caches the results for subsequent checks.Doc PR: vitessio/website#1878
Related Issue(s)
Checklist
Deployment Notes