Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for tracking total disconnected time and reconnection attempts #3220

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ggivo
Copy link
Contributor

@ggivo ggivo commented Mar 18, 2025

Description:
Introduces two new metrics to track the total time a connection remains disconnected until it is successfully reconnected and the number of reconnection attempts. The changes include:

  • New Metrics:
    • lettuce.reconnection.inactive.duration
      • Description: Measures the time taken for a successful reconnection after a disconnection.
      • Type: Timer
    • lettuce.reconnection.attempts
      • Description: Tracks the number of reconnection attempts made during a disconnection.
      • Implementation: Counter

Impact:

  • Provides better insights into connection stability and reconnection performance.

@ggivo ggivo force-pushed the lettuce-observability branch from d41a0f7 to e3fc47d Compare March 18, 2025 09:00
@ggivo ggivo marked this pull request as draft March 20, 2025 05:22
@ggivo
Copy link
Contributor Author

ggivo commented Mar 20, 2025

Considering adding a few more metrics
e.g

  • endpoint.command.queue - Gauge - tracks size of inflight commands (commands written to netty queue but not yet completed)
  • endpoint.command.buffer. - Gauge - tracks size of buffered commands (when autoflush=false)
  • endpoint.disconnected.buffer - Gauge - tracks size of buffered commands during disconnect

Those are implementation-specific and relevant to DefaultEndpoint.

This raises some open questions:
As of now, we have CommandLatencyRecorder responsible for gathering command latency metrics,

Do we continue with separate Metrics recorders, for example, one for ConnectionMonitoring (used by ConnectionWatchdog to track inactive connection time) and another for DefaultEndpoint (tracking the size of internal queues)... or have a single MetricsRecorder for both (ConnectionMonitoring, DefaultEndpoint)?

Do we want to enable/disable only connection-related, and endpoint-related metrics separately?

@tishun any opinion

@ggivo ggivo force-pushed the lettuce-observability branch from 7c0a5e0 to 3705008 Compare March 20, 2025 14:15
@ggivo ggivo force-pushed the lettuce-observability branch from bfbfd27 to d9d70ec Compare March 21, 2025 06:29
ggivo and others added 2 commits March 25, 2025 09:02
@ggivo ggivo force-pushed the lettuce-observability branch from 8914047 to 37492c4 Compare March 25, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant