diff --git a/content/departments/cloud/technical-docs/managed-smtp/index.md b/content/departments/cloud/technical-docs/managed-smtp/index.md index 81f85048753f..1e64a16845f7 100644 --- a/content/departments/cloud/technical-docs/managed-smtp/index.md +++ b/content/departments/cloud/technical-docs/managed-smtp/index.md @@ -42,7 +42,6 @@ Sourcegraph engineers can disable SMTP by setting the `.spec.managedSMTP.disable - Alerting: [frontend: email_delivery_failures](https://docs.sourcegraph.com/admin/observability/alerts#frontend-email-delivery-failures) - Dashboards: [Frontend: Email delivery](https://docs.sourcegraph.com/admin/observability/dashboards#frontend-email-delivery) -- [Multi-instance dashboard](../observability/index.md#multi-instance-dashboard): [Frontend: Total emails successfully delivered every 5 minutes](https://monitoring.sgdev.org/d/multi-instance-overviews/multi-instance-overviews?orgId=1) ### Vendor-side diff --git a/content/departments/cloud/technical-docs/observability/index.md b/content/departments/cloud/technical-docs/observability/index.md index 4ed7c4222b97..44c959f4d96d 100644 --- a/content/departments/cloud/technical-docs/observability/index.md +++ b/content/departments/cloud/technical-docs/observability/index.md @@ -1,31 +1,18 @@ # Cloud Observability -Epic link: https://github.com/sourcegraph/customer/issues/1151 - ## Metrics -Metrics are gathered from all resources using the included Prometheus instance. This instance scrapes and stores the metrics locally as well as forwards them to the Managed Prometheus service provided by GCP. +Metrics are gathered from all resources using the included Prometheus instance. This instance scrapes and stores the metrics locally. **Only metrics queried in our [monitoring generator](https://docs.sourcegraph.com/dev/background-information/observability/monitoring-generator) are forwarded - this allowlist is automatically generated.** If you'd like a new metric to be queryable in a centralized manner, you _must_ [create a dashboard panel](https://docs.sourcegraph.com/dev/how-to/add_monitoring#alerts-dashboards-and-documentation) for it. -These metrics are viewable through our centralised Grafana instance hosted at: https://monitoring.sgdev.org. - -> [!NOTE] access to these resources must be granted. To request access, follow the [Requesting access to Grafana](./operations.md#requesting-access-to-grafana). - ### Multi-instance dashboard -We generate a dashboard that renders panels that opt-in to a [multi-instance overviews dashboard](https://monitoring.sgdev.org/d/multi-instance-overviews/multi-instance-overviews). - -Panels in this dashboard show the panel's query grouped by `project_id`, each of which represents a Cloud instance. The template variable dropdown at the top allow you to select instances to compare, which is persisted to the URL. - -To opt-in a panel to this multi-instance dashboard, see [how to add monitoring](https://docs.sourcegraph.com/dev/how-to/add_monitoring#centralized-observability). +We no longer support multi-instance dashboards but the cloud-team is working on a replacement. ### Common operations -- Request access to the Grafana dashboard, follow [Requesting access to Grafana](./operations.md#requesting-access-to-grafana). - To add a new dashboard to all managed instances, follow the [Creating a new individual dashboard](./operations.md#creating-a-new-individual-dashboard) procedure. -- To create a new aggregated dashboard that queries multiple cloud instances, follow the [Creating a new multi-instance dashboard](./operations.md#creating-a-new-multi-instance-dashboard) procedure. -- To manually refresh the dashboards on Grafana, follow the [Manually regenerate Grafana dashboards](./operations.md#manually-regenerate-grafana-dashboards) playbook. ## Tracing diff --git a/content/departments/cloud/technical-docs/observability/operations.md b/content/departments/cloud/technical-docs/observability/operations.md index f94ffdfaf564..89f3aef0ff0c 100644 --- a/content/departments/cloud/technical-docs/observability/operations.md +++ b/content/departments/cloud/technical-docs/observability/operations.md @@ -2,49 +2,19 @@ ## Requesting access to Grafana -Users who do not automatically have access to the Grafana instance can request access through [Entitle](https://entitle.io/). On Slack, type `/access_request` and hit enter. Fill out the form wil the following values: -![Entitle Request Form](https://storage.googleapis.com/sourcegraph-assets/handbook/engineering/cloud/entitle-iap-request.png) + -A Cloud team or Security team member will then need to approve the request. If you require permanent access to Grafana, please post a message in the [#cloud channel](https://sourcegraph.slack.com/archives/C03JR7S7KRP) on Slack and request a Cloud team member provision you access. +To access the grafana dashboard for a single cloud customer: -## Granting a user permanent access to Grafana +1. Find the customer on https://cloud-ops.sgdev.org/, go to the specific customer page +1. Goto "View monitoring dashboards" for the specific instance +1. When you attempt to access the dashboard with the given command you may receive about access +1. You should use Entitle to request access to the specific instance using this [form](https://app.entitle.io/request?data=eyJkdXJhdGlvbiI6IjM2MDAiLCJqdXN0aWZpY2F0aW9uIjoiQWNjZXNzIHRvIGNsb3VkIGluc3RhbmNlICQkSU5TRVJUIENMT1VEIElOU1RBTkNFIEhFUkUkJCQgZm9yIEdyYWZhbmEgZGFzaGJvYXJkIiwiYnVuZGxlSWRzIjpbImNlNTZlMGU2LTE1ZDYtNGYzYS05M2RmLWRkMjQxOGQzNzhlYyJdfQ%3D%3D) -User management is provisioned within GCP. To grant a new user permanent access to Grafana they will need to either be added to an approved group or have their identity specifically added to the IAP proxy. +## Multi-instance dashboard -To add a user, navigate to the [GCP Console IAP management page](https://console.cloud.google.com/security/iap?project=control-plane-5e9ee072) for Grafana. Click the check box and the provisioning page should appear on the right. From there, click "Add Principal" and add the user. - -## Manually regenerate Grafana dashboards - -Grafama dashboards are [generated when the `centralized-o11y` invariant](https://sourcegraph.sourcegraph.com/github.com/sourcegraph/controller/-/blob/internal/invariants/centralized_o11y.go) is run against an instance: - -1. Cloud team members can run `mi2 instance check -e $ENVIRONMENT -s $SLUG -enforce centralized-o11y` locally. This will automatically generate an ID token, generate, and upload the dashboards to Grafana. +We not longer support multi-instance dashboards but the cloud-team is working on a replacement. ## Creating a new individual dashboard The dashboards for Cloud customers are generated from the same [dashboard definitions](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/monitoring/definitions) that are create the bundled dashboards included with all Sourcegraph distributions. To create a new dashboard that will be rolled out to all managed instances, follow the [Developing Observability](https://docs.sourcegraph.com/dev/background-information/observability) guidelines. - -## Creating a new multi-instance dashboard - -We are now able to see the value of a query applied to multiple instances at once. To create a dashboard that queries multiple customers at once, log into Grafana and use the native creation tools. It's recommended to start with an existing dashboard panel, click the title, and selecct "Explore". This will allow you to modify the prewritten query. All Cloud instances support the same set of metrics and are tagged with additional metadata to denote the customer. - -To view the results for a specific subset of customers, duplicate the query and filter each result for a given customer by changing the `project_id=` label selector. - -If the `project_id` is unknown for a given customer, follow the [FAQ: How do I figure out the GCP Project ID for a customer?](../../index.md#faq-how-do-i-figure-out-the-gcp-project-id-for-a-customer) instructions. - -> **NOTE**: Custom created dashboards _should_ persist through restarts however the Cloud team guarantees no SLAs. If a dashboard is mission-critical, please communicate with the Cloud team on getting it added as a permanent fixture. It's preferred that all dashboards are created in code and distributed as part of Sourcegraph itself. - -Metrics that use Prometheus aggregation functions (like `sum by`) will need to be updated to include the `project_id` as a a grouping field, e.g.: - -``` -sum by (job) (pg_stat_activity{project_id="sourcegraph-managed-sg"}) -``` - -would become - -``` -sum by (job, project_id) (pg_stag_activity) -``` - -to show the metric for all instances, labeled by their `project_id`. - -These dashboards will be pregenerated [in the future](https://github.com/sourcegraph/customer/issues/1610).