Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve monitoring for secondary priority metrics #65

Open
ryanjjung opened this issue Dec 9, 2024 · 0 comments
Open

Improve monitoring for secondary priority metrics #65

ryanjjung opened this issue Dec 9, 2024 · 0 comments
Labels
Milestone

Comments

@ryanjjung
Copy link
Collaborator

Add monitors for metrics not covered by issue #57 but which might occasionally reveal infra problems.

  • NLB unhealthy host count (related to RDS module)
  • ECS cluster/service alarms pertaining to extended metrics provided by Container Insights:
    • Used vs. reserved CPU/RAM
    • Scale metrics
  • SNS topic failed messages (currently used for email notification on alarms, making this a little redundant — if SNS can't notify, we won't be notified that it can't notify — but this will have other uses down the line).
  • RDS instances metrics (not used by Send, but I believe used by Appointment), such as:
    • Replication checkpoint lag
    • CPU credit balance
    • CPU/RAM utilization, freeable memory
    • Disk queue depth (to detect disk I/O problems), I/O latency
    • Various network metrics
    • Swap usage
  • EC2 instance metrics
    • CPU/RAM utilization
    • EBS volume I/O
    • Network I/O
    • Status check failures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant