Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable Customizing Treatment of Missing Data #26

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 40 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,51 +101,50 @@ module "aws-rds-alarms" {
}
```



## Variables

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:-----:|
| actions\_alarm | A list of actions to take when alarms are triggered. Will likely be an SNS topic for event distribution. | `list` | `[]` | no |
| actions\_ok | A list of actions to take when alarms are cleared. Will likely be an SNS topic for event distribution. | `list` | `[]` | no |
| anomaly\_period | The number of seconds that make each evaluation period for anomaly detection. | `string` | `"600"` | no |
| anomaly_band_width | The width of the anomaly band detection. Higher numbers means less sensitive | `string` | `"2"` | no |
| db\_instance\_id | RDS Instance ID | `string` | n/a | yes |
| engine | The RDS engine being used. Used for database engine specific alarms | `string` | `""` | no |
| evaluation\_period | The evaluation period over which to use when triggering alarms. | `string` | `"5"` | no |
| prefix | Alarm Name Prefix | `string` | `""` | no |
| statistic\_period | The number of seconds that make each statistic period. | `string` | `"60"` | no |
| tags | Tags to attach to each alarm | `map(string)` | `{}` | no |
| db_instance_class | The rds instance-class, e.g. `db.t3.medium` | `string` | | yes |
| cpu_utilization_too_high_threshold | Alarm threshold for the 'highCPUUtilization' alarm | `string` | `"90"` | no |
| cpu_credit_balance_too_low_threshold | Alarm threshold for the 'lowCPUCreditBalance' alarm | `string` | `"100"` | no |
| disk_queue_depth_too_high_threshold | Alarm threshold for the 'highDiskQueueDepth' alarm | `string` | `"64"` | no |
| disk_free_storage_space_too_low_threshold | Alarm threshold for the 'lowFreeStorageSpace' alarm (in bytes) | `string` | `"10000000000"` | no |
| disk_burst_balance_too_low_threshold | Alarm threshold for the 'lowEBSBurstBalance' alarm | `string` | `"100"` | no |
| maximum_used_transaction_ids_too_high_threshold | Alarm threshold for the 'maximumUsedTransactionIDs' alarm | `string` | `"1000000000"` | no |
| memory_freeable_too_low_threshold | Alarm threshold for the 'lowFreeableMemory' alarm (in bytes) | `string` | `"256000000"` | no |
| memory_swap_usage_too_high_threshold | Alarm threshold for the 'highSwapUsage' alarm (in bytes) | `string` | `"256000000"` | no |
| create_high_cpu_alarm | Whether or not to create the high cpu alarm | `bool` | `true` | no |
| create_low_cpu_credit_alarm | Whether or not to create the low cpu credit alarm | `bool` | `true` | no |
| create_high_queue_depth_alarm | Whether or not to create the high queue depth alarm | `bool` | `true` | no |
| create_low_disk_space_alarm | Whether or not to create the low disk space alarm | `bool` | `true` | no |
| create_low_disk_burst_alarm | Whether or not to create the low disk burst alarm | `bool` | `true` | no |
| create_low_memory_alarm | Whether or not to create the low memory free alarm | `bool` | `true` | no |
| create_swap_alarm | Whether or not to create the high swap usage alarm | `bool` | `true` | no |
| create_anomaly_alarm | Whether or not to create the fairly noisy anomaly alarm | `bool` | `true` | no |
| Name | Description | Type | Default | Required |
|-------------------------------------------------|----------------------------------------------------------------------------------------------------------|---------------|-----------------|:--------:|
| actions\_alarm | A list of actions to take when alarms are triggered. Will likely be an SNS topic for event distribution. | `list` | `[]` | no |
| actions\_ok | A list of actions to take when alarms are cleared. Will likely be an SNS topic for event distribution. | `list` | `[]` | no |
| anomaly\_period | The number of seconds that make each evaluation period for anomaly detection. | `string` | `"600"` | no |
| anomaly_band_width | The width of the anomaly band detection. Higher numbers means less sensitive | `string` | `"2"` | no |
| db\_instance\_id | RDS Instance ID | `string` | n/a | yes |
| engine | The RDS engine being used. Used for database engine specific alarms | `string` | `""` | no |
| evaluation\_period | The evaluation period over which to use when triggering alarms. | `string` | `"5"` | no |
| prefix | Alarm Name Prefix | `string` | `""` | no |
| statistic\_period | The number of seconds that make each statistic period. | `string` | `"60"` | no |
| tags | Tags to attach to each alarm | `map(string)` | `{}` | no |
| db_instance_class | The rds instance-class, e.g. `db.t3.medium` | `string` | | yes |
| cpu_utilization_too_high_threshold | Alarm threshold for the 'highCPUUtilization' alarm | `string` | `"90"` | no |
| cpu_credit_balance_too_low_threshold | Alarm threshold for the 'lowCPUCreditBalance' alarm | `string` | `"100"` | no |
| disk_queue_depth_too_high_threshold | Alarm threshold for the 'highDiskQueueDepth' alarm | `string` | `"64"` | no |
| disk_free_storage_space_too_low_threshold | Alarm threshold for the 'lowFreeStorageSpace' alarm (in bytes) | `string` | `"10000000000"` | no |
| disk_burst_balance_too_low_threshold | Alarm threshold for the 'lowEBSBurstBalance' alarm | `string` | `"100"` | no |
| maximum_used_transaction_ids_too_high_threshold | Alarm threshold for the 'maximumUsedTransactionIDs' alarm | `string` | `"1000000000"` | no |
| memory_freeable_too_low_threshold | Alarm threshold for the 'lowFreeableMemory' alarm (in bytes) | `string` | `"256000000"` | no |
| memory_swap_usage_too_high_threshold | Alarm threshold for the 'highSwapUsage' alarm (in bytes) | `string` | `"256000000"` | no |
| create_high_cpu_alarm | Whether or not to create the high cpu alarm | `bool` | `true` | no |
| create_low_cpu_credit_alarm | Whether or not to create the low cpu credit alarm | `bool` | `true` | no |
| create_high_queue_depth_alarm | Whether or not to create the high queue depth alarm | `bool` | `true` | no |
| create_low_disk_space_alarm | Whether or not to create the low disk space alarm | `bool` | `true` | no |
| create_low_disk_burst_alarm | Whether or not to create the low disk burst alarm | `bool` | `true` | no |
| create_low_memory_alarm | Whether or not to create the low memory free alarm | `bool` | `true` | no |
| create_swap_alarm | Whether or not to create the high swap usage alarm | `bool` | `true` | no |
| create_anomaly_alarm | Whether or not to create the fairly noisy anomaly alarm | `bool` | `true` | no |
| treat_missing_data | Determines how the alarm will handle missing data | `string` | `missing` | no |


## Outputs

| Name | Description |
|------|-------------|
| alarm\_connection\_count\_anomalous | The CloudWatch Metric Alarm resource block for anomalous Connection Count |
| alarm\_cpu\_credit\_balance\_too\_low | The CloudWatch Metric Alarm resource block for low CPU Credit Balance |
| alarm\_cpu\_utilization\_too\_high | The CloudWatch Metric Alarm resource block for high CPU Utilization |
| alarm\_disk\_burst\_balance\_too\_low | The CloudWatch Metric Alarm resource block for low Disk Burst Balance |
| alarm\_disk\_free\_storage\_space\_too\_low | The CloudWatch Metric Alarm resource block for low Free Storage Space |
| alarm\_disk\_queue\_depth\_too\_high | The CloudWatch Metric Alarm resource block for high Disk Queue Depth |
| alarm\_memory\_freeable\_too\_low | The CloudWatch Metric Alarm resource block for low Freeable Memory |
| alarm\_memory\_swap\_usage\_too\_high | The CloudWatch Metric Alarm resource block for high Memory Swap Usage |
| Name | Description |
|---------------------------------------------|------------------------------------------------------------------------------------|
| alarm\_connection\_count\_anomalous | The CloudWatch Metric Alarm resource block for anomalous Connection Count |
| alarm\_cpu\_credit\_balance\_too\_low | The CloudWatch Metric Alarm resource block for low CPU Credit Balance |
| alarm\_cpu\_utilization\_too\_high | The CloudWatch Metric Alarm resource block for high CPU Utilization |
| alarm\_disk\_burst\_balance\_too\_low | The CloudWatch Metric Alarm resource block for low Disk Burst Balance |
| alarm\_disk\_free\_storage\_space\_too\_low | The CloudWatch Metric Alarm resource block for low Free Storage Space |
| alarm\_disk\_queue\_depth\_too\_high | The CloudWatch Metric Alarm resource block for high Disk Queue Depth |
| alarm\_memory\_freeable\_too\_low | The CloudWatch Metric Alarm resource block for low Freeable Memory |
| alarm\_memory\_swap\_usage\_too\_high | The CloudWatch Metric Alarm resource block for high Memory Swap Usage |
| alarm_maximum_used_transaction_ids_too_high | The CloudWatch Metric Alarm resource block for postgres' Transaction ID Wraparound |
11 changes: 11 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ resource "aws_cloudwatch_metric_alarm" "cpu_utilization_too_high" {
alarm_description = "Average database CPU utilization is too high."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = var.db_instance_id
Expand All @@ -40,6 +41,7 @@ resource "aws_cloudwatch_metric_alarm" "cpu_credit_balance_too_low" {
alarm_description = "Average database CPU credit balance is too low, a negative performance impact is imminent."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = var.db_instance_id
Expand All @@ -61,6 +63,7 @@ resource "aws_cloudwatch_metric_alarm" "disk_queue_depth_too_high" {
alarm_description = "Average database disk queue depth is too high, performance may be negatively impacted."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = var.db_instance_id
Expand All @@ -81,6 +84,7 @@ resource "aws_cloudwatch_metric_alarm" "disk_free_storage_space_too_low" {
alarm_description = "Average database free storage space is too low and may fill up soon."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = var.db_instance_id
Expand All @@ -101,6 +105,7 @@ resource "aws_cloudwatch_metric_alarm" "disk_burst_balance_too_low" {
alarm_description = "Average database storage burst balance is too low, a negative performance impact is imminent."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = var.db_instance_id
Expand All @@ -122,6 +127,7 @@ resource "aws_cloudwatch_metric_alarm" "memory_freeable_too_low" {
alarm_description = "Average database freeable memory is too low, performance may be negatively impacted."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = var.db_instance_id
Expand All @@ -142,6 +148,7 @@ resource "aws_cloudwatch_metric_alarm" "memory_swap_usage_too_high" {
alarm_description = "Average database swap usage is too high, performance may be negatively impacted."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = var.db_instance_id
Expand All @@ -159,6 +166,7 @@ resource "aws_cloudwatch_metric_alarm" "connection_count_anomalous" {
alarm_description = "Anomalous database connection count detected. Something unusual is happening."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

metric_query {
id = "e1"
Expand Down Expand Up @@ -200,6 +208,7 @@ resource "aws_cloudwatch_metric_alarm" "maximum_used_transaction_ids_too_high" {
alarm_description = "Nearing a possible critical transaction ID wraparound."
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data
}

# SOC2 requirements
Expand All @@ -216,6 +225,7 @@ resource "aws_cloudwatch_metric_alarm" "read_iops_too_high" {
alarm_description = "Average Read IO over last ${(var.evaluation_period * var.statistic_period / 60)} minutes too high, performance may suffer"
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = "${var.db_instance_id}-read-iops-too-high"
Expand All @@ -235,6 +245,7 @@ resource "aws_cloudwatch_metric_alarm" "write_iops_too_high" {
alarm_description = "Average Write IO over last ${(var.evaluation_period * var.statistic_period / 60)} minutes too high, performance may suffer"
alarm_actions = var.actions_alarm
ok_actions = var.actions_ok
treat_missing_data = var.treat_missing_data

dimensions = {
DBInstanceIdentifier = "${var.prefix}rds-${var.db_instance_id}-write-iops-too-high"
Expand Down
11 changes: 11 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,14 @@ variable "maximum_used_transaction_ids_too_high_threshold" {
default = "1000000000" // 1 billion. Half of total.
description = "Alarm threshold for the 'maximumUsedTransactionIDs' alarm"
}

variable "treat_missing_data" {
description = "Determines how the alarm will handle missing data"
type = string
default = "missing"

validation {
condition = contains(["missing", "ignore", "breaching", "notBreaching"], var.treat_missing_data)
error_message = "The treat_missing_data variable must be one of: 'missing', 'ignore', 'breaching', 'notBreaching'."
}
}
Loading