-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add more metrics #2454
base: main
Are you sure you want to change the base?
feat: add more metrics #2454
Conversation
@@ -767,6 +768,19 @@ func (a *FlowableActivity) RecordSlotSizes(ctx context.Context) error { | |||
return | |||
} | |||
slotMetricGauges.IntervalSinceLastNormalizeGauge = intervalSinceLastNormalizeGauge | |||
|
|||
syncedTablesGauge, err := a.OtelManager.GetOrInitInt64Gauge( | |||
otel_metrics.BuildMetricName(otel_metrics.SyncedTablesGaugeName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either now or after we should make RecordGauge
utility method on otelmanager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I feel that all gauges/counters can probably be initialized in otel manager. There is no flow specific information when we initialize them
var tags []string | ||
if errors.Is(err, context.Canceled) { | ||
tags = append(tags, string(shared.ErrTypeCanceled)) | ||
// TODO this is only set for context.Canceled, other types need to be added too | ||
errorClassString = "context.Canceled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't this pass tag through for all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking whether our current classification is enough. Currently it is based on technical error types, we mostly need better differentiators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we mostly need better differentiators
To determine whether they are customer facing alerts or not, the errorClass is just an initial draft for now
28cc355
to
4f3bfea
Compare
Adds a few new metrics:
error_emitted
- instantaneous gauge of whether an error is beingemitted
errors_emitted
- counter of errors emitted over timerecords_synced
- number of records synced for every sync batch.synced_tables
- number of tables configured in each mirrorinstance_status
- a constant1
metric every 5 minutes (same frequency as slots) with attributes differentiating the current statusTODO: maybe refactor code so that maybe all gauges/counters are initialized inside the otelManager?
ref #2341