Improve monitoring for secondary priority metrics #65

ryanjjung · 2024-12-09T20:47:40Z

Add monitors for metrics not covered by issue #57 but which might occasionally reveal infra problems.

NLB unhealthy host count (related to RDS module)
ECS cluster/service alarms pertaining to extended metrics provided by Container Insights:
- Used vs. reserved CPU/RAM
- Scale metrics
SNS topic failed messages (currently used for email notification on alarms, making this a little redundant — if SNS can't notify, we won't be notified that it can't notify — but this will have other uses down the line).
RDS instances metrics (not used by Send, but I believe used by Appointment), such as:
- Replication checkpoint lag
- CPU credit balance
- CPU/RAM utilization, freeable memory
- Disk queue depth (to detect disk I/O problems), I/O latency
- Various network metrics
- Swap usage
EC2 instance metrics
- CPU/RAM utilization
- EBS volume I/O
- Network I/O
- Status check failures

ryanjjung added enhancement New feature or request M: Pulumi Monitoring labels Dec 9, 2024

ryanjjung added this to the v0.0.10 milestone Dec 9, 2024

ryanjjung mentioned this issue Jan 10, 2025

Support monitoring for network load balancers #76

Closed

ryanjjung modified the milestones: v0.0.10, v0.0.11 Jan 13, 2025

Provide feedback