Skip to content
This repository has been archived by the owner on Jul 2, 2024. It is now read-only.

Commit

Permalink
msp-ops: add alert information + link to alert dashboard (#8911)
Browse files Browse the repository at this point in the history
Adds all service alerts to the bottom of the ops pages.

Adds a link to the msp dashboards to the per environment info table

generated from: https://github.com/sourcegraph/sourcegraph/pull/61939

---------

Co-authored-by: jac <[email protected]>
  • Loading branch information
jac and jac authored Apr 18, 2024
1 parent d0b0639 commit 236f53c
Show file tree
Hide file tree
Showing 14 changed files with 1,367 additions and 243 deletions.
103 changes: 90 additions & 13 deletions content/departments/engineering/managed-services/build-tracker.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
<!--
Generated documentation; DO NOT EDIT. Regenerate using this command: 'sg msp operations generate-handbook-pages'
Last updated: 2024-04-12 12:41:21.95814 +0000 UTC
Generated from: https://github.com/sourcegraph/managed-services/tree/cc51eaa4e11a3146ae0a173cc2b80076466df8f7
Last updated: 2024-04-18 18:06:57.908273 +0000 UTC
Generated from: https://github.com/sourcegraph/managed-services/tree/b48c02fa7c553af5b6888efff69b85b48717db54
-->

This document describes operational guidance for Build Tracker infrastructure.
Expand Down Expand Up @@ -39,17 +39,17 @@ Changes to Build Tracker are continuously delivered to the first stage ([prod](#

### prod

| PROPERTY | DETAILS |
| ------------------- | ------------------------------------------------------------------------------------------------------ |
| Project ID | [`build-tracker-prod-59bf`](https://console.cloud.google.com/run?project=build-tracker-prod-59bf) |
| Category | **test** |
| Deployment type | `rollout` |
| Resources | [prod Redis](#prod-redis) |
| Slack notifications | [#alerts-build-tracker-prod](https://sourcegraph.slack.com/archives/alerts-build-tracker-prod) |
| Alerts | [GCP monitoring](https://console.cloud.google.com/monitoring/alerting?project=build-tracker-prod-59bf) |
| Errors | [Sentry `build-tracker-prod`](https://sourcegraph.sentry.io/projects/build-tracker-prod/) |
| Domain | [build-tracker.sgdev.org](https://build-tracker.sgdev.org) |
| Cloudflare WAF ||
| PROPERTY | DETAILS |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Project ID | [`build-tracker-prod-59bf`](https://console.cloud.google.com/run?project=build-tracker-prod-59bf) |
| Category | **test** |
| Deployment type | `rollout` |
| Resources | [prod Redis](#prod-redis) |
| Slack notifications | [#alerts-build-tracker-prod](https://sourcegraph.slack.com/archives/alerts-build-tracker-prod) |
| Alert policies | [GCP Monitoring alert policies list](https://console.cloud.google.com/monitoring/alerting/policies?project=build-tracker-prod-59bf), [Dashboard](https://console.cloud.google.com/monitoring/dashboards?pageState=%28%22dashboards%22%3A%28%22t%22%3A%22All%22%29%2C%22dashboardList%22%3A%28%22f%22%3A%22%255B%257B_22k_22_3A_22Type_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22Custom_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22category_22%257D%255D%22%29%29&project=build-tracker-prod-59bf) |
| Errors | [Sentry `build-tracker-prod`](https://sourcegraph.sentry.io/projects/build-tracker-prod/) |
| Domain | [build-tracker.sgdev.org](https://build-tracker.sgdev.org) |
| Cloudflare WAF | |

MSP infrastructure access needs to be requested using Entitle for time-bound privileges. Test environments may have less stringent requirements.

Expand Down Expand Up @@ -107,3 +107,80 @@ The Terraform Cloud workspaces for this service environment are [grouped under t
```bash
sg msp tfc view build-tracker prod
```

### Alert Policies

The following alert policies are defined for each of this service's environments.

#### High Container CPU Utilization

```md
High CPU Usage - it may be neccessary to reduce load or increase CPU allocation
```

Severity: WARNING

#### High Container Memory Utilization

```md
High Memory Usage - it may be neccessary to reduce load or increase memory allocation
```

Severity: WARNING

#### Container Startup Latency

```md
Service containers are taking longer than configured timeouts to start up.
```

Severity: WARNING

#### Cloud Redis - System CPU Utilization

```md
Redis Engine CPU Utilization goes above the set threshold. The utilization is measured on a scale of 0 to 1.
```

Severity: WARNING

#### Cloud Redis - Standard Instance Failover

```md
Instance failover occured for a standard tier Redis instance.
```

Severity: WARNING

#### Cloud Redis - System Memory Utilization

```md
Redis System memory utilization is above the set threshold. The utilization is measured on a scale of 0 to 1.
```

Severity: WARNING

#### Cloud Run Pending Requests

```md
There are requests pending - we may need to increase Cloud Run instance count, request concurrency, or investigate further.
```

Severity: WARNING

#### Cloud Run Instance Precondition Failed

```md
Cloud Run instance failed to start due to a precondition failure.
This is unlikely to cause immediate downtime, and may auto-resolve if no new instances are created and/or we return to a healthy state, but you should follow up to ensure the latest Cloud Run revision is healthy.
```

Severity: WARNING

#### External Uptime Check

```md
Service is failing to repond on https://build-tracker.sgdev.org - this may be expected if the service was recently provisioned or if its external domain has changed.
```

Severity: CRITICAL
Loading

0 comments on commit 236f53c

Please sign in to comment.