-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlstats: add new MaybeFlushWithDrainer method to persistedSqlStats #140673
base: master
Are you sure you want to change the base?
Conversation
753f0cb
to
ac31349
Compare
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
ce294dd
to
9217e5c
Compare
9217e5c
to
9c50c9b
Compare
fa1c2cf
to
5264c73
Compare
e884d43
to
eabd876
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 41 of 41 files at r1, 3 of 33 files at r2, 10 of 36 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @angles-n-daemons, and @xinhaoz)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 114 at r3 (raw file):
} bucket := s.getBucket()
It's a bit odd that s
provides the bucket and ssDrainer
provides the data and we assume that they match. Anything we can do about that? Or maybe there's a connection I'm missing here.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 149 at r3 (raw file):
log.Warningf(ctx, "failed to dump stats to log, %s", err.Error()) } containerStmtStats, containerTxnStats := container.DrainStats(ctx)
Is there a way we could say that DrainStats
gives the caller ownership of these slices? would be nice to to have to copy them over. I assume if we can do that, they're no longer being modified anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @angles-n-daemons, @dhartunian, and @xinhaoz)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 114 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
It's a bit odd that
s
provides the bucket andssDrainer
provides the data and we assume that they match. Anything we can do about that? Or maybe there's a connection I'm missing here.
This is how it was previously, I just created a struct to encapsulate this data. That said, the appstatspb.CollectedStatementStatistics
and appstatspb.CollectedTransactionStatistics
do have aggregated_ts
and aggregation_interval
fields on them already, so theoretically we could populate them when we call ssmemstorage.Container.DrainStats()
.
One thing I'm unclear about is the disconnect between collected statistics in the in memory containers and aggregation / bucketing . While the CollectedStats
related types have these fields, they are only ever set when flushing via persistedstatsSQL, when being queried from crdb_internal.cluster_statement_statistics
, or requested from one of the gRPC statement APIs. I'm not sure if I'm missing something somewhere, but it seems to me like the determination of which bucket a stat belongs to is tightly coupled to sql stats flush. If flushing is disabled, it seems like queried statistics will always be considered part of the most recent bucket (I don't know if this is correct or not). To me, it would make more sense to push the aggregation data all the way down to the write path of the ssmemstorage.Container, but I haven't spent time evaluation how big of an effort this would be. I don't think it is trivial.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 149 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Is there a way we could say that
DrainStats
gives the caller ownership of these slices? would be nice to to have to copy them over. I assume if we can do that, they're no longer being modified anyway.
As discussed in person:
- container.DrainStats uses the sqlstats iterators to create new copies of the data and return before deleting the stats from its container. This would be a larger endeavor to re-implement, so I'm gonna punt on that for now.
- I'll look into the the performance of the operation below where we are flattening the stats using append and spread to see if there are any extra allocations that we may want to avoid.
eabd876
to
b2ed62a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @angles-n-daemons, @dhartunian, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 149 at r3 (raw file):
Previously, kyle-a-wong (Kyle Wong) wrote…
As discussed in person:
- container.DrainStats uses the sqlstats iterators to create new copies of the data and return before deleting the stats from its container. This would be a larger endeavor to re-implement, so I'm gonna punt on that for now.
- I'll look into the the performance of the operation below where we are flattening the stats using append and spread to see if there are any extra allocations that we may want to avoid.
I added a new benchmark that benchmarks DrainStats. Locally, I added back in the ConsumeStats
function updated the benchmark to call ConsumeStats
,with noop callbacks, and got the following results:
name old time/op new time/op delta
SqlStatsDrain/drainsql-10-12 469ns ±11% 456ns ±12% ~ (p=0.382 n=8+9)
SqlStatsDrain/drainsql-50-12 2.12µs ±20% 2.24µs ±13% ~ (p=0.280 n=10+10)
SqlStatsDrain/drainsql-100-12 4.21µs ±65% 3.44µs ±44% ~ (p=0.079 n=10+9)
SqlStatsDrain/drainsql-10000-12 356µs ±19% 506µs ±65% ~ (p=0.113 n=9+10)
SqlStatsDrain/drainsql-1000-12 23.9µs ±11% 28.3µs ±42% +18.32% (p=0.011 n=8+9)
name old alloc/op new alloc/op delta
SqlStatsDrain/drainsql-100-12 17.6kB ± 0% 17.4kB ± 0% -1.02% (p=0.000 n=10+10)
SqlStatsDrain/drainsql-10000-12 1.37MB ± 0% 1.36MB ± 0% -0.96% (p=0.000 n=10+8)
SqlStatsDrain/drainsql-10-12 1.68kB ± 0% 1.67kB ± 0% -0.95% (p=0.000 n=10+10)
SqlStatsDrain/drainsql-50-12 8.74kB ± 0% 8.65kB ± 0% -0.95% (p=0.000 n=10+10)
SqlStatsDrain/drainsql-1000-12 180kB ± 0% 178kB ± 0% -0.91% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SqlStatsDrain/drainsql-10-12 3.00 ± 0% 3.00 ± 0% ~ (all equal)
SqlStatsDrain/drainsql-50-12 12.0 ± 0% 12.0 ± 0% ~ (all equal)
SqlStatsDrain/drainsql-100-12 22.0 ± 0% 22.0 ± 0% ~ (all equal)
SqlStatsDrain/drainsql-1000-12 203 ± 0% 203 ± 0% ~ (all equal)
SqlStatsDrain/drainsql-10000-12 1.50k ± 0% 1.50k ± 0% ~ (all equal)
The results show that the new implementation of the DrainStats, that returns the list, results in ~1% more memory consumption than the ConsumeStats. My assumption that this increase in alloc/op is a result of a new slice being created on each append. Do you think this is worth going back to the ConsumeStats
callback style of processing drained stats?
25c5fbe
to
eb5dff8
Compare
@@ -1046,3 +1044,8 @@ func getStatusJSONProto( | |||
url := fmt.Sprintf("/_status/%s?start=%d&end=%d", path, startTime.Unix(), endTime.Unix()) | |||
return serverutils.GetJSONProto(ts, url, response) | |||
} | |||
|
|||
func SqlStatsFlushTestServerLocal(ts serverutils.TestServerInterface) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The function name makes the code a little harder to understand, can we consider renaming or just removing this wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this function is even necessary. What's the value of hiding what the flush is doing? I'd rather know that MaybeFlush
is being called in my test without having to hop around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. See other comment for an explanation of why I had originally made these helpers.
@@ -114,3 +114,19 @@ type RecordedTxnStats struct { | |||
SessionData *sessiondata.SessionData | |||
TxnErr error | |||
} | |||
|
|||
// SSDrainer is the interface for draining or resetting sql stats. | |||
type SSDrainer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I just double check my understaning for the motivation of these changes (coming from the 1 pager):
What is going to be implementing SSDrainer aside from SQLStats? From my understanding, as part of the new job, we would do the grpc fanout (in some yet to be determined batched method) that will be calling DrainStats on each sslocal.SQLStats container. Where else do we need a drainer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a new ssRemote.SQLStats
that will also implement this and will be responsible for the fanout. This is what the job will provide to the MaybeFlushWithDrainer
method to do the cluster wide drain / persist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 36 files at r3, 2 of 3 files at r4, 4 of 6 files at r5, 2 of 4 files at r6, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @angles-n-daemons, @kyle-a-wong, and @xinhaoz)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 382 at r6 (raw file):
} func generateStatementStatsQuery(statementCount int) string {
what do you think of just inlining these? unless you're going to write extensive unit tests for them, I'd just inline since args and query need to match up and will be easier to see.
pkg/sql/sqlstats/test_utils.go
line 53 at r5 (raw file):
SkipZoneConfigBootstrap bool // FlushInterceptor intercepts persistsqlstats flush operation.
nit: persistedsqlstats
pkg/sql/sqlstats/test_utils.go
line 20 at r6 (raw file):
aggregatedTs time.Time, stmtStats []*appstatspb.CollectedStatementStatistics, txnStats []*appstatspb.CollectedTransactionStatistics)
nit: comma + line break before )
. our formatter doesn't do that?
pkg/sql/sqlstats/persistedsqlstats/provider.go
line 34 at r6 (raw file):
) const statementStatisticUpsertQuery = `
why do these live so far away from where they're used? can we move next to usage?
pkg/sql/sqlstats/persistedsqlstats/sqlstatstestutil/testutils.go
line 298 at r5 (raw file):
} func FlushTestServerLocal(ts serverutils.TestServerInterface) {
@kyle-a-wong what does this helper provide? I think for the long line that it saves it also obscures something quite simple. I'd just leave the original call here unless there's a compelling reason I'm missing.
@@ -1046,3 +1044,8 @@ func getStatusJSONProto( | |||
url := fmt.Sprintf("/_status/%s?start=%d&end=%d", path, startTime.Unix(), endTime.Unix()) | |||
return serverutils.GetJSONProto(ts, url, response) | |||
} | |||
|
|||
func SqlStatsFlushTestServerLocal(ts serverutils.TestServerInterface) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this function is even necessary. What's the value of hiding what the flush is doing? I'd rather know that MaybeFlush
is being called in my test without having to hop around.
Adds a new SSDrainer interface that PersistedSQLStats uses to flush sql stats. Instead of calling ConsumeStats on sslocal.SQLStats, persistedsqlstats will use a provided SSDrainer to either DrainStats or Reset. The ConsumeStats method on sslocal.SQLStats has been removed and its logic has been redistributed to persistedsqlstats, as that was the only consumer of the method. Since all the logic now lives in persistedsqlstats, the flush related functions can be called directly instead of relying on nested callbacks. In addition to the new interface, this commit updates the flush logic to upsert multiple rows in a single upsert statement, determined by `sql.stats.flush.batch_size`. Epic: CRDB-45771 Release note: None
eb5dff8
to
afee562
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @angles-n-daemons, @dhartunian, and @xinhaoz)
pkg/sql/sqlstats/test_utils.go
line 53 at r5 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit:
persistedsqlstats
done
pkg/sql/sqlstats/test_utils.go
line 20 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: comma + line break before
)
. our formatter doesn't do that?
I guess not. Done
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 382 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
what do you think of just inlining these? unless you're going to write extensive unit tests for them, I'd just inline since args and query need to match up and will be easier to see.
Done
pkg/sql/sqlstats/persistedsqlstats/provider.go
line 34 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
why do these live so far away from where they're used? can we move next to usage?
Done
pkg/sql/sqlstats/persistedsqlstats/sqlstatstestutil/testutils.go
line 298 at r5 (raw file):
Previously, dhartunian (David Hartunian) wrote…
@kyle-a-wong what does this helper provide? I think for the long line that it saves it also obscures something quite simple. I'd just leave the original call here unless there's a compelling reason I'm missing.
Originally, this PR contained a larger refactor that included getting rid of the embded sslocal.SQLStats
from persistedSQLStats
and changed the MaybeFlush
signature to require the SSDrainer to be provided. Doing that required all updates to all tests t hat called MaybeFlush
to get this localSqlStats
object from the server and it felt cleaner to me to only have to do that in one place instead of every place we call MaybeFlush
.
I've since went back on that idea for this PR and opted for the new MaybeFlushWithDrainer
method instead, but i never removed these helps / the refactor. If necessary I can add this back in later, but for now i've removed this helper
@@ -114,3 +114,19 @@ type RecordedTxnStats struct { | |||
SessionData *sessiondata.SessionData | |||
TxnErr error | |||
} | |||
|
|||
// SSDrainer is the interface for draining or resetting sql stats. | |||
type SSDrainer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a new ssRemote.SQLStats
that will also implement this and will be responsible for the fanout. This is what the job will provide to the MaybeFlushWithDrainer
method to do the cluster wide drain / persist
@@ -1046,3 +1044,8 @@ func getStatusJSONProto( | |||
url := fmt.Sprintf("/_status/%s?start=%d&end=%d", path, startTime.Unix(), endTime.Unix()) | |||
return serverutils.GetJSONProto(ts, url, response) | |||
} | |||
|
|||
func SqlStatsFlushTestServerLocal(ts serverutils.TestServerInterface) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. See other comment for an explanation of why I had originally made these helpers.
afee562
to
303b813
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Can you link to #139143 which I assigned to you prior to merging?
Reviewed 6 of 13 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @alyshanjahani-crl, @angles-n-daemons, and @xinhaoz)
Adds a new SSDrainer interface that PersistedSQLStats uses to flush sql stats.
Instead of calling ConsumeStats on sslocal.SQLStats, persistedsqlstats will use a provided SSDrainer to either DrainStats or Reset.
The ConsumeStats method on sslocal.SQLStats has been removed and its logic has been redistributed to persistedsqlstats, as that was the only consumer of the method. Since all the logic now lives in persistedsqlstats, the flush related functions can be called directly instead of relying on nested callbacks.
Epic: CRDB-45771
Release note: None
Note: This is a stack committed, only the last commit in the chain needs to be reviewed