Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…oring into OM-199
  • Loading branch information
mphanias committed Jan 10, 2025
2 parents c8105ce + 701e086 commit 9155ac6
Show file tree
Hide file tree
Showing 4 changed files with 715 additions and 27 deletions.
20 changes: 10 additions & 10 deletions config/datadog/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@ cd examples/otel
### Step 2: Configure Docker Compose and OpenTelemetry Collector

#### datadog-docker-compose.yml
The [datadog-docker-compose.yml](https://github.com/aerospike/aerospike-monitoring/blob/master/examples/otel/datadog-docker-compose.yml) file contains services such as `aerospike-prometheus-exporter` and `otel-collector` that require specific environment configurations and volume mounts.
The [datadog-docker-compose.yml](../../examples/otel/datadog-docker-compose.yml) file contains services such as `aerospike-prometheus-exporter` and `otel-collector` that require specific environment configurations and volume mounts.

#### datadog-otel-collector-config.yml
The [datadog-otel-collector-config.yml](https://github.com/aerospike/aerospike-monitoring/blob/master/examples/otel/datadog-otel-collector-config.yml) configures the OpenTelemetry Collector with receivers, processors, exporters, and service pipelines for handling traces.
The [datadog-otel-collector-config.yml](../../examples/otel/datadog-otel-collector-config.yml) configures the OpenTelemetry Collector with receivers, processors, exporters, and service pipelines for handling traces.

**Important:** Update the `datadog-api-site` and `datadog-api-key` in this configuration file to match your Datadog account details.
![OpenTelemetry Collector API Config](assets/otel-collector-api-config.png)
Expand Down Expand Up @@ -97,32 +97,32 @@ In the **"New Dashboard"** screen, click on the **"Configure"** option on the to
![Datadog Dashboard Import](assets/datadog-dashbaord-import.png)
---
## Creating Bulk Monitoring Alerts in Datadog
## Creating Monitors/Alerts in Datadog
To create bulk monitoring alerts in Datadog, you can use the provided **Python script**. This script reads multiple monitor configurations from a JSON file and creates monitors via the **Datadog API**.
To create Monitors/Alerts in Datadog, you can use the provided **Python script**. This script reads multiple monitor configurations from a JSON file and creates monitors via the **Datadog API**.
### Prerequisites
- **Datadog API Key** and **Application Key** are required.
- Python must be installed on your machine.
### Steps to Create Bulk Alerts
### Steps to Create Monitors
The `datadog_alerts_creation.py` script reads alert rules from a JSON file and creates corresponding monitors in Datadog using the Datadog API.
The `datadog_monitors_creation.py` script reads alert rules from a JSON file and creates corresponding monitors in Datadog using the Datadog API.

#### Important

- `api_key` and `app_key`: These should be updated with your actual **Datadog API** and **Application** keys.
- `datadog_site`: Update this with the site where your **Datadog** account is hosted.
- `aerospike_rules.json`: This JSON file contains multiple monitor configurations, such as monitor name, type, query, and message.
- `api_key` and `app_key`: During runtime, you will be prompted to enter your actual Datadog API Key and Datadog Application Key.
- `datadog_site`: During runtime, you will also be prompted to enter the Datadog site where your account is hosted (e.g., `datadoghq.com`, `us5.datadoghq.com`).
- `aerospike_datadog_monitors.json`: This JSON file contains multiple monitor configurations, such as monitor name, type, query, and message.
- Make sure to adjust the **thresholds** according to your requirements.

### Run the Python Script

After updating the necessary configurations, run the script using the following command:

```bash
python datadog_alerts_creation.py
python3 datadog_monitors_creation.py
```
---

Expand Down
30 changes: 13 additions & 17 deletions config/datadog/dashboards/usecases/rolling_restarts.json
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,6 @@
"title": "Dead",
"title_size": "16",
"title_align": "left",
"time": {},
"type": "query_value",
"requests": [
{
Expand Down Expand Up @@ -193,7 +192,6 @@
"title": "Unavailable",
"title_size": "16",
"title_align": "left",
"time": {},
"type": "query_value",
"requests": [
{
Expand Down Expand Up @@ -567,7 +565,6 @@
"title": "Quiesces Pending",
"title_size": "16",
"title_align": "left",
"time": {},
"type": "query_value",
"requests": [
{
Expand Down Expand Up @@ -679,7 +676,6 @@
"title": "HWM breached",
"title_size": "16",
"title_align": "left",
"time": {},
"type": "query_value",
"requests": [
{
Expand Down Expand Up @@ -1158,7 +1154,7 @@
{
"id": 3296083092942762,
"definition": {
"title": "XDR Lag (rate)",
"title": "XDR Lag",
"title_size": "16",
"title_align": "left",
"show_legend": true,
Expand All @@ -1168,20 +1164,21 @@
"min",
"max"
],
"time": {},
"type": "timeseries",
"requests": [
{
"formulas": [
{
"alias": "XDR Lag",
"formula": "per_second(query1)"
"formula": "query1"
}
],
"queries": [
{
"data_source": "metrics",
"name": "query1",
"query": "sum:aerospike.aerospike_xdr_lag{$aerospike_cluster,$aerospike_service,$ns} by {aerospike_cluster,aerospike_service}.rollup(max, 30)"
"query": "sum:aerospike.aerospike_xdr_lag{$aerospike_cluster,$aerospike_service,$ns} by {aerospike_cluster,aerospike_service}"
}
],
"response_format": "timeseries",
Expand Down Expand Up @@ -1228,7 +1225,6 @@
"min",
"max"
],
"time": {},
"type": "timeseries",
"requests": [
{
Expand Down Expand Up @@ -1583,7 +1579,6 @@
"min",
"max"
],
"time": {},
"type": "timeseries",
"requests": [
{
Expand Down Expand Up @@ -2346,7 +2341,7 @@
{
"id": 5261966586423734,
"definition": {
"title": "HWM Breaches (rate) (total)",
"title": "HWM Breaches (total)",
"title_size": "16",
"title_align": "left",
"show_legend": true,
Expand All @@ -2356,20 +2351,21 @@
"min",
"max"
],
"time": {},
"type": "timeseries",
"requests": [
{
"formulas": [
{
"alias": "HWM_Breached",
"formula": "per_second(query1)"
"formula": "query1"
}
],
"queries": [
{
"data_source": "metrics",
"name": "query1",
"query": "sum:aerospike.aerospike_namespace_hwm_breached{$aerospike_cluster,$aerospike_service,$ns} by {aerospike_cluster,aerospike_service,ns}.rollup(max, 30)"
"query": "sum:aerospike.aerospike_namespace_hwm_breached{$aerospike_cluster,$aerospike_service,$ns} by {aerospike_cluster,aerospike_service,ns}"
}
],
"response_format": "timeseries",
Expand Down Expand Up @@ -2644,7 +2640,7 @@
"x": 0,
"y": 9,
"width": 12,
"height": 1
"height": 11
}
},
{
Expand Down Expand Up @@ -2820,7 +2816,7 @@
},
"layout": {
"x": 0,
"y": 10,
"y": 20,
"width": 12,
"height": 1,
"is_column_break": true
Expand Down Expand Up @@ -3075,7 +3071,7 @@
},
"layout": {
"x": 0,
"y": 11,
"y": 21,
"width": 12,
"height": 1
}
Expand Down Expand Up @@ -3416,7 +3412,7 @@
},
"layout": {
"x": 0,
"y": 12,
"y": 22,
"width": 12,
"height": 1
}
Expand Down Expand Up @@ -3451,4 +3447,4 @@
"layout_type": "ordered",
"notify_list": [],
"reflow_type": "fixed"
}
}
Loading

0 comments on commit 9155ac6

Please sign in to comment.