From d6ac92461d9a66aa25c38fa18cf0208560e9a370 Mon Sep 17 00:00:00 2001 From: Robert Lin Date: Wed, 27 Mar 2024 16:08:45 +0900 Subject: [PATCH] dotcom: clean up operational docs --- .../dev/process/deployments/index.md | 29 ++- .../dev/process/deployments/instances.md | 12 +- .../dev/process/deployments/kubernetes.md | 2 +- .../dev/process/deployments/playbooks.md | 207 +++++++++++++----- .../dev/process/deployments/postgresql.md | 115 +--------- .../dev/process/incidents/playbooks/ci.md | 1 - .../dev/tools/observability/dotcom.md | 67 +++--- .../dev/tools/observability/index.md | 4 +- .../dev/tools/observability/monitoring.md | 2 +- 9 files changed, 212 insertions(+), 227 deletions(-) diff --git a/content/departments/engineering/dev/process/deployments/index.md b/content/departments/engineering/dev/process/deployments/index.md index 8ea2aee0fb0c..57822325cf82 100644 --- a/content/departments/engineering/dev/process/deployments/index.md +++ b/content/departments/engineering/dev/process/deployments/index.md @@ -2,20 +2,19 @@ For a complete list of Sourcegraph instances we manage, see our [instances documentation](instances.md). -- [Deployments](#deployments) - - [Deployment basics](#deployment-basics) - - [Images](#images) - - [Renovate](#renovate) - - [ArgoCD](#argocd) - - [Infrastructure](#infrastructure) - - [deploy-sourcegraph](#deploy-sourcegraph) - - [Merging changes from deploy-sourcegraph](#merging-changes-from-deploy-sourcegraph) - - [Relationship between deploy-sourcegraph repositories](#relationship-between-deploy-sourcegraph-repositories) - - [Merging upstream `deploy-sourcegraph` into `deploy-sourcegraph` forks](#merging-upstream-deploy-sourcegraph-into-deploy-sourcegraph-forks) - - [Sourcegraph Cloud](#sourcegraph-cloud) - - [Continuous Deployment Process](#continuous-deployment-process) - - [Deployment observability](#deployment-observability) - - [Deployment traces](#deployment-traces) +- [Deployment basics](#deployment-basics) + - [Images](#images) + - [Renovate](#renovate) + - [ArgoCD](#argocd) + - [Infrastructure](#infrastructure) +- [deploy-sourcegraph](#deploy-sourcegraph) + - [Merging changes from deploy-sourcegraph](#merging-changes-from-deploy-sourcegraph) +- [Relationship between deploy-sourcegraph repositories](#relationship-between-deploy-sourcegraph-repositories) + - [Merging upstream `deploy-sourcegraph` into `deploy-sourcegraph` forks](#merging-upstream-deploy-sourcegraph-into-deploy-sourcegraph-forks) +- [DotCom](#dotcom) + - [Continuous Deployment Process](#continuous-deployment-process) +- [Deployment observability](#deployment-observability) + - [Deployment traces](#deployment-traces) Additional resources: @@ -37,7 +36,7 @@ Each Sourcegraph service is provided as a Docker image. Every commit to `main` i When [a new semver release](../releases/index.md) is cut the pipelines, will build a release image with the same tag as the latest [release version](https://github.com/sourcegraph/sourcegraph/tags) as well. These are used by customer deployments. -For pushing custom images, refer to [building Docker images for specific branches](#building-docker-images-for-a-specific-branch). +For pushing custom images, see `sg ci docs`. ### Renovate diff --git a/content/departments/engineering/dev/process/deployments/instances.md b/content/departments/engineering/dev/process/deployments/instances.md index 799f65f2f357..fafb0160c2e8 100644 --- a/content/departments/engineering/dev/process/deployments/instances.md +++ b/content/departments/engineering/dev/process/deployments/instances.md @@ -18,13 +18,15 @@ Also see [playbooks](./playbooks.md) for common actions related to operating our [![Build status](https://badge.buildkite.com/ef1289610fdd05b606bf1e57a034af2365c7b09c95ac6121f9.svg)](https://buildkite.com/sourcegraph/deploy-sourcegraph-cloud) -This deployment is also colloquially referred to as 'DotCom' and 'sourcegraph.com'. It is the public deployment available to the public at [sourcegraph.com/search](https://sourcegraph.com/search). +This deployment is also colloquially referred to as 'DotCom' and 'sourcegraph.com'. +It is the public deployment available to the public at [sourcegraph.com/search](https://sourcegraph.com/search), and is currently operated by the [Core Services team](../../../teams/core-services/index.md). `sourcegraph.com` deploys the latest changes from [`sourcegraph/sourcegraph`](https://github.com/sourcegraph/sourcegraph) on a [daily basis](index.md#continuous-deployment-process). -This deployment also includes our [documentation](https://docs.sourcegraph.com/) and [about](https://about.sourcegraph.com/) sites. +This deployment **does not** include the [about](https://about.sourcegraph.com/) site and the [new documentation site at sourcegraph.com/docs](https://sourcegraph.com/docs). +It currently still includes the legacy [docs.sourcegraph.com](https://docs.sourcegraph.com/) site, however. -> 🐶 For dogfooding changes, use [k8s.sgdev.org](#k8ssgdevorg) instead, which generally receives updates faster. +> [!NOTE] 🐶 For dogfooding changes, use [sourcegraph.sourcegraph.com](#sourcegraphsourcegraphcom-s2) instead, which generally receives updates faster. - [DotCom cluster on GCP](https://console.cloud.google.com/kubernetes/clusters/details/us-central1-f/cloud?project=sourcegraph-dev) ``` @@ -34,12 +36,14 @@ This deployment also includes our [documentation](https://docs.sourcegraph.com/) - [Infrastructure configuration](https://github.com/sourcegraph/infrastructure/tree/main/cloud) - Alerts: #alerts-cloud and [OpsGenie](../incidents/on_call.md) - [Playbooks](./playbooks.md#sourcegraphcom) +- [Observability](../../tools/observability/dotcom.md) +- [Domain routing rules](https://sourcegraph.sourcegraph.com/github.com/sourcegraph/infrastructure/-/blob/gfe/envs/prod/project/routes.tf) ## k8s.sgdev.org [![Build status](https://badge.buildkite.com/65c9b6f836db6d041ea29b05e7310ebb81fa36741c78f207ce.svg?branch=release)](https://buildkite.com/sourcegraph/deploy-sourcegraph-dogfood-k8s-2) -**NO LONGER PRIMARY DOGFOODING INSTANCE, SEE [S2](#sourcegraphsourcegraphcom-s2) BELOW** +> [!WARNING] **THIS IS NO LONGER PRIMARY DOGFOODING INSTANCE, SEE [S2](#sourcegraphsourcegraphcom-s2) BELOW** This deployment is also colloquially referred to as "dogfood", "dogfood-k8s", or just "k8s". This is the Sourcegraph instance to use for dogfooding changes to Sourcegraph. diff --git a/content/departments/engineering/dev/process/deployments/kubernetes.md b/content/departments/engineering/dev/process/deployments/kubernetes.md index 80de3145005c..035bf03c343f 100644 --- a/content/departments/engineering/dev/process/deployments/kubernetes.md +++ b/content/departments/engineering/dev/process/deployments/kubernetes.md @@ -1,4 +1,4 @@ -# Kubernetes +# Working with Kubernetes deployments This section contains tips and advice for interacting with our Kubernetes deployments (most notably [sourcegraph.com](#sourcegraph-cloud) and [k8s.sgdev.org](#k8s-sgdev-org)). diff --git a/content/departments/engineering/dev/process/deployments/playbooks.md b/content/departments/engineering/dev/process/deployments/playbooks.md index cb6e81cb68d0..7c019bb43754 100644 --- a/content/departments/engineering/dev/process/deployments/playbooks.md +++ b/content/departments/engineering/dev/process/deployments/playbooks.md @@ -1,27 +1,35 @@ # Playbooks for deployments -- [Playbooks for deployments](#playbooks-for-deployments) - - [General](#general) - - [Debugging](#debugging) - - [Check what version of Sourcegraph is deployed](#check-what-version-of-sourcegraph-is-deployed) - - [Sourcegraph.com](#sourcegraphcom) - - [Deploying to sourcegraph.com](#deploying-to-sourcegraphcom) - - [Deploying to sourcegraph.com during code freeze](#deploying-to-sourcegraphcom-during-code-freeze) - - [Manually deploying a service to sourcegraph.com](#manually-deploying-a-service-to-sourcegraphcom) - - [Rolling back sourcegraph.com](#rolling-back-sourcegraphcom) - - [Disable Renovate](#disable-renovate) - - [Backing up & restoring a Cloud SQL instance (production databases)](#backing-up--restoring-a-cloud-sql-instance-production-databases) - - [Invalidating all user sessions](#invalidating-all-user-sessions) - - [Accessing sourcegraph.com database](#accessing-sourcegraphcom-database) - - [Via the CLI](#via-the-cli) - - [Via BigQuery (for read-only operations)](#via-bigquery-for-read-only-operations) - - [Restarting docs.sourcegraph.com](#restarting-docssourcegraphcom) - - [Creating banners for maintenance tasks](#creating-banners-for-maintenance-tasks) - - [Gitserver disk space related maintenance](#gitserver-disk-space-related-maintenance) - - [k8s.sgdev.org](#k8ssgdevorg) - - [Manage users in k8s.sgdev.org](#manage-users-in-k8ssgdevorg) - - [PostgreSQL](#postgresql) - - [Cloudflare Configuration](#cloudflare-configuration) +This page collects playbooks for Sourcegraph deployments managed and operated by the company. +Refer to [the instances page](./instances.md) for a complete listing. + +- [General](#general) +- [Debugging](#debugging) + - [Check what version of Sourcegraph is deployed](#check-what-version-of-sourcegraph-is-deployed) +- [Sourcegraph.com](#sourcegraphcom) + - [Observability](#observability) + - [Deploying to sourcegraph.com](#deploying-to-sourcegraphcom) + - [Deploying to sourcegraph.com during code freeze](#deploying-to-sourcegraphcom-during-code-freeze) + - [Manually deploying a service to sourcegraph.com](#manually-deploying-a-service-to-sourcegraphcom) + - [Rolling back sourcegraph.com](#rolling-back-sourcegraphcom) + - [Accessing sourcegraph.com database](#accessing-sourcegraphcom-database) + - [Connect to dotcom database via command line](#connect-to-dotcom-database-via-command-line) + - [Using Cloud SQL Proxy](#using-cloud-sql-proxy) + - [Example database queries](#example-database-queries) + - [Connect to dotcom database via BigQuery](#connect-to-dotcom-database-via-bigquery) + - [Backing up \& restoring a Cloud SQL instance (production databases)](#backing-up--restoring-a-cloud-sql-instance-production-databases) + - [Database performance monitoring](#database-performance-monitoring) + - [Invalidating all user sessions](#invalidating-all-user-sessions) + - [Restarting docs.sourcegraph.com](#restarting-docssourcegraphcom) + - [Creating banners for maintenance tasks](#creating-banners-for-maintenance-tasks) + - [Gitserver disk space related maintenance](#gitserver-disk-space-related-maintenance) + - [Blocked repos](#blocked-repos) + - [Outlandishly sized repos](#outlandishly-sized-repos) + - [Blocking a repo](#blocking-a-repo) +- [k8s.sgdev.org](#k8ssgdevorg) + - [Manage users in k8s.sgdev.org](#manage-users-in-k8ssgdevorg) + - [Accessing k8s.sgdev.org database](#accessing-k8ssgdevorg-database) +- [Cloudflare Configuration](#cloudflare-configuration) ## General @@ -29,6 +37,10 @@ See [debugging](./debugging/index.md). +### Working with Kubernetes deployments + +See [Working with Kubernetes deployments](./kubernetes.md) + ### Check what version of Sourcegraph is deployed [Install `sg`, the Sourcegraph developer tool](https://github.com/sourcegraph/sourcegraph/blob/main/dev/sg/README.md), and using the [`sg live` command](https://github.com/sourcegraph/sourcegraph/blob/main/dev/sg/README.md#sg-live---see-currently-deployed-version) you can see the version currently deployed for a specific environment: @@ -39,13 +51,17 @@ sg live ## Sourcegraph.com -To learn more about this deployment, see [instances](./instances.md#sourcegraph-cloud). +To learn more about this deployment, see [instances](./instances.md#dotcom). + +### Observability + +See [Sourcegraph.com observability](../../tools/observability/dotcom.md) for general observability guidance for the instance. ### Deploying to sourcegraph.com Every commit to the `release` branch (the default branch) on [deploy-sourcegraph-cloud](https://github.com/sourcegraph/deploy-sourcegraph-cloud) deploys the Kubernetes YAML in this repository to our dot-com cluster [in CI](https://buildkite.com/sourcegraph/deploy-sourcegraph-cloud/builds?branch=release) (i.e. if CI is green then the latest config in the `release` branch is deployed). -Deploys on sourcegraph.com are currently [handled by Renovate](#renovate). The [Renovate dashboard](https://app.renovatebot.com/dashboard#github/sourcegraph/deploy-sourcegraph-cloud) shows logs for previous runs and allows you to predict when the next run will happen. +Deploys on sourcegraph.com are currently [handled by GitHub Actions](index.md#continuous-deployment-process). If you want to expedite a deploy, you can manually create and merge a PR that updates the Docker image tags in [deploy-sourcegraph-cloud](https://github.com/sourcegraph/deploy-sourcegraph-cloud). You can find the desired Docker image tags by looking at the output of the Docker build step in [CI on sourcegraph/sourcegraph `main` branch](https://buildkite.com/sourcegraph/sourcegraph/builds?branch=main) or by looking at [Docker Hub](https://hub.docker.com/u/sourcegraph/). @@ -113,61 +129,142 @@ git push origin release 🚨 You also need to disable auto-deploys to prevent Renovate from automatically merging in image digest updates so that the site doesn't roll-forward. -### Disable Renovate +### Accessing sourcegraph.com database -1. Go to [renovate.json](https://github.com/sourcegraph/deploy-sourcegraph-cloud/blob/release/renovate.json5) and comment out the file. -1. Ensure that no Renovate PRs are currently pending to update the images [here](https://github.com/sourcegraph/sourcegraph/pulls/app%2Frenovate) -1. After the incident, revert your commit and uncomment the file. +Sourcegraph.com utilizes an external HA database in Google Cloud. +We currently run two separate databases. +The `sg-cloud` database is the primary database, and the code-intel team uses the `sg-cloud-code-intel` database. -### Backing up & restoring a Cloud SQL instance (production databases) +You can directly view the database in [GCP](https://console.cloud.google.com/sql/instances?project=sourcegraph-dev). -Before any potentially risky operation you should ensure the databases have recent ( < 1 hour) backups. We currently have daily backups enabled. +To connect to the database, there are two options: -You can create a backup of a Cloud SQL instance via `gcloud sql backups create --instance=${instance_name} --project=sourcegraph-dev` +1. [Connect to dotcom database via command line](#connect-to-dotcom-database-via-command-line) +2. [Connect to dotcom database via BigQuery](#connect-to-dotcom-database-via-bigquery) (read-only access) -To restore a Cloud SQL instance to a previous revision you can use `gcloud sql backups restore $BACKUP_ID --restore-instance=${instance_name}` +#### Connect to dotcom database via command line -You can also perform these commands from the [Google Cloud SQL UI](https://console.cloud.google.com/sql/instances?project=sourcegraph-dev) +> [!WARNING] Before trying to connect to the dotcom database, you need to: +> +> - make an [Entitle request](https://app.entitle.io/) for either the `Sourcegraph Read only access` permission set to get read-only access or `Sourcegraph Dot Com projects` permission set for write access +> - ensure you have [installed the Google Cloud SDK](https://cloud.google.com/sdk/docs/install) - `sg setup` also handles this for you. -🚨 You should notify the #dev-ops channel if an situation arises when a restore my be required. It should also be filed in our ops-incident log. +We utilize the [Google Cloud SDK](https://cloud.google.com/sdk) utility [Cloud SQL Proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy) to connect to our production databases. By default, our Cloud SQL databases are not accessible. -### Invalidating all user sessions +There are two ways of connecting: either using the `gcloud sql connect` command, which will use the `pgsql` client, or running the `cloud_sql_proxy` on a port locally to utilize your preferred tools. -If all user sessions need to be invalidated, you can run this on the `frontend` database to force all users to log in again. +You may use these `gcloud` commands to connect directly to the databases: -``` -UPDATE users SET invalidated_sessions_at=now(), updated_at=now(); -``` +- Default database (`sg-cloud`) [user passwords](https://start.1password.com/open/i?a=HEDEDSLHPBFGRBTKAKJWE23XX4&v=dnrhbauihkhjs5ag6vszsme45a&i=pjxf64qxwsin4d56xij6vm3gva&h=my.1password.com) -### Accessing sourcegraph.com database + ```sh + gcloud beta sql connect --project sourcegraph-dev sg-cloud-732a936743 --user=dev-readonly -d=sg + ``` -#### Via the CLI +- `sg-cloud-code-intel` database [user passwords](https://start.1password.com/open/i?a=HEDEDSLHPBFGRBTKAKJWE23XX4&v=dnrhbauihkhjs5ag6vszsme45a&i=hbgj2dfajwj7cdiifk3zb2h2b4&h=my.1password.com) -Sourcegraph.com utilizes an external HA database. You will need to connect to it directly. The easiest way to do this is through the `gcloud` cli. + ```sh + gcloud beta sql connect --project sourcegraph-dev sg-cloud-code-intel-9fc67e507c --user=dev-readonly -d=sg + ``` -To connect to the production database: +If you receive an error while connecting, ensure you have the required permissions through Entitle and re-request them if they have expired. -``` - gcloud beta sql connect sg-cloud-732a936743 --user=sg -d sg --project sourcegraph-dev +Go to [Example Queries](#example-database-queries) to continue + +##### Using Cloud SQL Proxy + +Using `cloud_sql_proxy` allows you to connect to the database with any client of your choice. +Install the Cloud SQL proxy by running this command with `gcloud`: + +```sh +gcloud components install cloud_sql_proxy ``` -However, if you want to use any other SQL client, you'll have to run the [`cloud_sql_proxy`](https://cloud.google.com/sql/docs/postgres/connect-admin-proxy#install) utility, which authenticates with you local `gcloud` credentials automatically. +To get started, run the `cloud_sql_proxy` against our production instance: +```sh +cloud_sql_proxy -instances=sourcegraph-dev:us-central1:sg-cloud-732a936743=tcp:5555 ``` - cloud_sql_proxy -instances=sourcegraph-dev:us-central1:sg-cloud-732a936743=tcp:5555 + +Now, in a new terminal, run the command below. The database will be running on `localhost:5555` + +```sh +export PGPASSWORD='<$PASSWORD>' +psql -h localhost -p 5555 -d sg -U 'dev-readonly' ``` -Once the proxy connects successfully, you can use any client to connect to the local `5555` port (you can choose any other port you want). +Note, that to connect to `localhost:5555` you still need to supply the postgres password stored in 1Password (mentioned above). + +##### Example database queries + +> [!WARNING] 🔥 **You are directly interfacing with the production database.** +> If you are unsure of any commands, please reach out in #discuss-dev-ops or #chat-dev. +> Please prefer using a readonly user. + +- See all fields on a table (ie the `repo` table) -The password of the sg user is in our shared 1Password under [Google Cloud SQL](https://team-sourcegraph.1password.com/vaults/dnrhbauihkhjs5ag6vszsme45a/allitems/svfiw4vcbxhhbobpl442olyebu) + ```psql + \d+ repo + ``` -#### Via BigQuery (for read-only operations) +- See the total number of rows in the `repo` table + + ```psql + SELECT COUNT(*) FROM repo; + ``` + +#### Connect to dotcom database via BigQuery You can also query the production database via BigQuery as an external data source. +Using BigQuery, if you want to run the query: + +```psql +SELECT name::text,created_at::text FROM repo LIMIT 5; +``` + +against the Prod CloudSQL database, you need to run the following in [BigQuery console](https://console.cloud.google.com/bigquery?sq=527047051561:67f2616f4acb4b7cb3639e4a97e2f4aa): + +```psql +SELECT * FROM EXTERNAL_QUERY("sourcegraph-dev.us.sg-cloud", "SELECT name::text,created_at::text FROM repo LIMIT 5;"); +``` + +Note that here, we are passing the PostgreSQL query in the second parameter to `EXTERNAL_QUERY`. See an [example query](https://console.cloud.google.com/bigquery?sq=527047051561:bfa7c7e57f884d209f261d15e4610229) to get started. -**Note**: This method only permits read-only access +> [!NOTE] This method only permits read-only access. For write access, try [connecting to the dotcom database via command line](#connect-to-dotcom-database-via-command-line). + +### Backing up & restoring a Cloud SQL instance (production databases) + +Before any potentially risky operation you should ensure the databases have recent ( < 1 hour) backups. We currently have daily backups enabled. + +You can create a backup of a Cloud SQL instance via `gcloud sql backups create --instance=${instance_name} --project=sourcegraph-dev` + +To restore a Cloud SQL instance to a previous revision you can use `gcloud sql backups restore $BACKUP_ID --restore-instance=${instance_name}` + +You can also perform these commands from the [Google Cloud SQL UI](https://console.cloud.google.com/sql/instances?project=sourcegraph-dev) + +> [!WARNING] 🚨 You should notify the #dev-ops channel if an situation arises when a restore my be required. It should also be filed in our ops-incident log. + +### Database performance monitoring + +We run a PgHero deployment as well you can use to analyze slow queries and overall database performance. + +```sh +kubectl port-forward -n monitoring deploy/pghero 8080:8080 +``` + +And then navigate to http://localhost:8080 to view the dashboard + +See additional Postgres tips in our [incident docs](../incidents/playbooks/index.md#postgreSQL-database-problems) + +### Invalidating all user sessions + +If all user sessions need to be invalidated, you can run this on the `frontend` database to force all users to log in again. + +```psql +UPDATE users SET invalidated_sessions_at=now(), updated_at=now(); +``` ### Restarting docs.sourcegraph.com @@ -317,9 +414,13 @@ To create an account on [k8s.sgdev.org](https://k8s.sgdev.org), log in with your To promote a user to site admin (required to make configuration changes), use the admin user credentials available in 1password (titled `k8s.sgdev.org admin user`) to log in to [k8s.sgdev.org](https://k8s.sgdev.org), and go to the [users page](https://k8s.sgdev.org/site-admin/users) to promote the desired user. -## PostgreSQL +### Accessing k8s.sgdev.org database + +This instance is run completely on Kubernetes, including its Postgres databases. -See [PostgreSQL](./postgresql.md) +1. First, [connect to the cluster](./instances.md#k8ssgdevorg). +2. Then you can port-forward the pgsql deployment: `kubectl port-forward -n dogfood-k8s pgsql-0 8080:5432` +3. Then access it locally: `pgcli -h localhost -p 8080 -d sg -U 'sg'` ## Cloudflare Configuration diff --git a/content/departments/engineering/dev/process/deployments/postgresql.md b/content/departments/engineering/dev/process/deployments/postgresql.md index 688aa074643a..428d55bfe5f7 100644 --- a/content/departments/engineering/dev/process/deployments/postgresql.md +++ b/content/departments/engineering/dev/process/deployments/postgresql.md @@ -2,119 +2,12 @@ For deployments other than Cloud and Sourcegraph.com please use the information [here](https://docs.sourcegraph.com/admin/faq#how-do-i-access-the-sourcegraph-database) to access the database. -## Sourcegraph.com specific - -We currently run two separate databases. The `sg-cloud` database is the primary database, and the code-intel team uses the `sg-cloud-code-intel`. - -You can also directly view the database in [GCP](https://console.cloud.google.com/sql/instances?project=sourcegraph-dev). - -We utilize the [Google Cloud SDK](https://cloud.google.com/sdk) utility [Cloud SQL Proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy) to connect to our production databases. By default, our Cloud SQL databases are not accessible. - -There are two ways of connecting: either using the `gcloud sql connect` command, which will use the `pgsql` client, or running the `cloud_sql_proxy` on a port locally to utilize your preferred tools. - -**NOTE:** before trying to connect to the database you need to make an [Entitle request](https://app.entitle.io/) for either the `Sourcegraph Read only access` permission set to get read-only access or `Sourcegraph Dot Com projects` permission set for write access. - -For read-only access, there is also an option of using [BigQuery](https://console.cloud.google.com/bigquery?sq=527047051561:67f2616f4acb4b7cb3639e4a97e2f4aa) and their `EXTERNAL_QUERY` syntax. - -Using BigQuery, if you want to run a query - -``` -SELECT name::text,created_at::text FROM repo LIMIT 5; -``` - -against the Prod CloudSQL database, you need to run - -``` -SELECT * FROM EXTERNAL_QUERY("sourcegraph-dev.us.sg-cloud", "SELECT name::text,created_at::text FROM repo LIMIT 5;"); -``` - -in the BigQuery editor (passing the PostgreSQL query in the second parameter to EXTERNAL_QUERY). - -### Connecting to Postgres - -#### Install the command line tools - -If you didn't yet, [install Google Cloud SDK](https://cloud.google.com/sdk/docs/install). Ensure, that `gcloud` command is reachable on your path. - -Install the Cloud SQL proxy by running this command with `gcloud`: - -``` - gcloud components install cloud_sql_proxy -``` - -#### Request permission using Entitle - -Request the "Sourcegraph Dot Com projects" bundle using Entitle to ensure you have the correct GCP permissions to access the databases. - -#### Command line only use (pgsql) - -> [!IMPORTANT] Make sure you have requested permission via Entitle before executing any of these commands - see [here](#request-permission-using-entitle) +> [!WARNING] **This page is deprecated** - please refer to and contribute to the [deployments playbooks](../deployments/playbooks.md) instead. -You may use these gcloud commands to connect directly to the databases: - -- Default db {[Password](https://start.1password.com/open/i?a=HEDEDSLHPBFGRBTKAKJWE23XX4&v=dnrhbauihkhjs5ag6vszsme45a&i=pjxf64qxwsin4d56xij6vm3gva&h=my.1password.com)} - ``` - gcloud beta sql connect --project sourcegraph-dev sg-cloud-732a936743 --user=dev-readonly -d=sg - ``` -- Code intel db {[Password](https://start.1password.com/open/i?a=HEDEDSLHPBFGRBTKAKJWE23XX4&v=dnrhbauihkhjs5ag6vszsme45a&i=hbgj2dfajwj7cdiifk3zb2h2b4&h=my.1password.com)} - - ``` - gcloud beta sql connect --project sourcegraph-dev sg-cloud-code-intel-9fc67e507c --user=dev-readonly -d=sg - ``` - -If you receive an error while connecting, ensure you have the required permissions through Entitle and re-request them if they have expired. - -Go to [Example Queries](#example-queries) to continue - -#### Proxy for advanced use - -> [!IMPORTANT] Make sure you have requested permission via Entitle before executing any of these commands - see [here](#request-permission-using-entitle) - -Run the `cloud_sql_proxy` against our production instance - -``` - cloud_sql_proxy -instances=sourcegraph-dev:us-central1:sg-cloud-732a936743=tcp:5555 -``` - -Now, in a new terminal, run the command below. The database will be running on `localhost:5555` - -``` - export PGPASSWORD='<$PASSWORD>' - psql -h localhost -p 5555 -d sg -U 'dev-readonly' -``` - -Note, that to connect to `localhost:5555` you still need to supply the postgres password stored in 1Password (mentioned above). - -### Example queries - -> 🔥 You are directly interfacing with the production database. If you are unsure of any commands, please reach out in #dev-chat or #dev-ops. -> Please prefer using the readonly user `frontend-dev` - -- See all fields on a table (ie the `repo` table) - ``` - \d+ repo - ``` -- See the total number of rows in the `repo` table - ``` - SELECT COUNT(*) FROM repo; - ``` - -### Performance monitoring - -We run a PgHero deployment as well you can use to analyze slow queries and overall database performance. - -``` - kubectl port-forward -n monitoring deploy/pghero 8080:8080 -``` - -And then navigate to http://localhost:8080 to view the dashboard +## Sourcegraph.com specific -See additional Postgres tips in our [incident docs](../incidents/playbooks/index.md#postgreSQL-database-problems) +Refer to [deployments playbooks: Accessing sourcegraph.com database](./playbooks.md#accessing-sourcegraphcom-database) ## Dogfood specific -[Dogfood](https://k8s.sgdev.org) runs Sourcegraph completely on Kubernetes. - -1. First, [connect to the cluster](./instances.md#k8ssgdevorg). -2. Then you can port-forward the pgsql deployment: `kubectl port-forward -n dogfood-k8s pgsql-0 8080:5432` -3. Then access it locally: `pgcli -h localhost -p 8080 -d sg -U 'sg'` +Refer to [deployments playbooks: Accessing k8s.sgdev.org database](./playbooks.md#accessing-k8ssgdevorg-database) diff --git a/content/departments/engineering/dev/process/incidents/playbooks/ci.md b/content/departments/engineering/dev/process/incidents/playbooks/ci.md index 17c43f9a9198..90db78c521a3 100644 --- a/content/departments/engineering/dev/process/incidents/playbooks/ci.md +++ b/content/departments/engineering/dev/process/incidents/playbooks/ci.md @@ -106,7 +106,6 @@ In order to handle problems with the CI, the following elements are necessary: #### Actions 1. Identify the error in common with the recent builds on [Buildkite](https://buildkite.com/sourcegraph/sourcegraph/builds?branch=main). - - 💡 See [How to use loki here](#actions-4) 1. Find the build where the problem appeared for the first time. - 💡 Often it's the first build that became red, but check that the error is the same to be sure. 1. Is this an external failure or an internal one? diff --git a/content/departments/engineering/dev/tools/observability/dotcom.md b/content/departments/engineering/dev/tools/observability/dotcom.md index fdfe2d7d9e0d..8100e807d39e 100644 --- a/content/departments/engineering/dev/tools/observability/dotcom.md +++ b/content/departments/engineering/dev/tools/observability/dotcom.md @@ -1,58 +1,42 @@ # Sourcegraph.com observability -We provide some tooling to make [Sourcegraph.com](../../process/deployments/instances.md#sourcegraph-cloud) easier to monitor and observe. This includes observability for relevant critical infrastructure such as our [CI/CD pipelines](#ci-logs). +We provide some tooling to make [Sourcegraph.com instance](../../process/deployments/instances.md#dotcom) easier to monitor and observe. For general observability development, please refer to the [observability development documentation](https://docs.sourcegraph.com/dev/background-information/observability) instead, which includes links to useful how-to guides. > [!NOTE] Looking for _how to monitor Sourcegraph?_ See the [observability documentation](https://docs.sourcegraph.com/admin/observability). -## Monitoring +## Metrics and alerting For metrics and alerting, see the [Sourcegraph monitoring guide](./monitoring.md). -## Grafana Cloud +## Logging -We have a Grafana Cloud instance at [sourcegraph.grafana.net](https://sourcegraph.grafana.net/). Accounts are automatically provisioned by logging in with GSuite oAuth. Quick links: +Service logs are available in GCP logging in the `sourcegraph-dev` project. +The quick-and-easy way is to go to the [GCP console workloads page](https://console.cloud.google.com/kubernetes/workload/overview?project=sourcegraph-dev), select the workload of interest, and head over to the "Logs" tab. -- [Explore logs](https://sourcegraph.grafana.net/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22grafanacloud-sourcegraph-logs%22,%7B%22refId%22:%22A%22,%22expr%22:%22%7Bdeploy%3D%5C%22sourcegraph%5C%22%7D%22%7D%5D) -- [Explore traces](https://sourcegraph.grafana.net/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22grafanacloud-sourcegraph-traces%22,%7B%22refId%22:%22A%22%7D%5D) -- [CI dashboard](https://sourcegraph.grafana.net/d/iBBWbxFnk/ci?orgId=1) +Sourcegraph service logs [follow a standardized JSON format](https://sourcegraph.com/docs/admin/observability/logs#logs) - you can use [this Logs Explorer view](https://cloudlogging.app.goo.gl/WXpyV1uSzDWnLMg7A) which is preconfigured with important attributes extracted to the log summary line, and uncomment the `labels.k8s-pod/app` filter to target your workload of choice. +The resulting log filter should look something like this: -### Logs - -Logs in Grafana Cloud is provided by [Grafana Loki](https://grafana.com/oss/loki/), a logs aggregation system that uses a PromQL-like query language called [LogQL](https://grafana.com/docs/loki/latest/logql/). - -Loki allows you to easily query for logs, filter for fields within structured logs, and even generate metrics from logs. The [official LogQL documentation](https://grafana.com/docs/loki/latest/logql/) provides a complete reference, or you can refer to [this cheatsheet](https://megamorf.gitlab.io/cheat-sheets/loki/) for a brief overview. - -#### Cloud logs - -The Loki instance in Grafana Cloud is currently configured to ingest logs from Sourcegraph.com pushed from [`grafana-agent`'s Loki configuration](https://github.com/sourcegraph/deploy-sourcegraph-cloud/blob/release/configure/grafana-agent/grafana-agent.ConfigMap.yaml#L58). To query these, you can start with a LogQL query like: - -```logql -{deploy="sourcegraph",app="sourcegraph-frontend"} - | logfmt - | lvl="warn" +```none +labels.k8s-pod/app="sourcegraph-frontend" +resource.type="k8s_container" +resource.labels.project_id="sourcegraph-dev" +resource.labels.location="us-central1-f" +resource.labels.cluster_name="cloud" +resource.labels.namespace_name="prod" ``` -#### CI logs - -The `sourcegraph/sourcegraph` CI pipeline also [uploads pipeline logs using `sg` to Loki](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/dev/upload-build-logs.sh). -These uploads only happen for _failed builds_ on `main` - we do not publish data for successful builds or branch builds (for those, you can refer to our [build traces](https://docs.sourcegraph.com/dev/background-information/ci/development#pipeline-command-tracing)). -To query logs, you can start with a [LogQL query](#logs) like: +You can also use `kubectl` to work with service log output in the command line - see the [Kubernetes guide](../../process/deployments/kubernetes.md) to get started. -```logql -{app="buildkite",branch="main",state="failed"} - |~ "FAILED:" -``` +## Tracing -Also refer to the [CI dashboard](https://sourcegraph.grafana.net/d/iBBWbxFnk/ci?orgId=1), which is a set of graphs based on the contents of uploaded logs, for more examples—just select a panel and click "Explore" to see the underlying query. +Traces are available in [Cloud Trace](https://console.cloud.google.com/traces/list?project=sourcegraph-dev) and an [in-cluster Jaeger deployment](https://sourcegraph.com/-/debug/jaeger/). +The latter is only accessible with site admin permissions - see [Site-admin access to internal instances](../../../../security/admin-access-internal-instances.md). -A demo is also available that demonstrates one of the most common use cases of this functionality, assessing [flakes](https://docs.sourcegraph.com/dev/background-information/ci#flakes): [how to find out if a build is a recurring flake](https://www.loom.com/share/58cedf44d44c45a292f650ddd3547337). +Trace spans meeting certain criteria are also exported to [Honeycomb](https://ui.honeycomb.io/sourcegraph) via our OpenTelemetry Collector deployment - see [`otel-collector.ConfigMap.yaml`](https://github.com/sourcegraph/deploy-sourcegraph-cloud/blob/release/base/otel-collector/otel-collector.ConfigMap.yaml) for our current configuration. -Additional resources: - -- [CI observability](https://docs.sourcegraph.com/dev/background-information/ci/development#observability) -- [CI playbook](../../process/incidents/playbooks/ci.md) +Also refer to [how to use traces](https://sourcegraph.com/docs/admin/observability/tracing#how-to-use-traces). ## Cloudflare @@ -64,12 +48,8 @@ This section gives a quick overview of how to access Cloudflare analytics, and h Cloudflare Analytics provides a somewhat [limited](https://developers.cloudflare.com/analytics/graphql-api/limits) API for retrieving monitoring data. Note that you can only retrieve relatively recent data, and have a limited number of operations. -### Tools - Cloudflare recommends using [GraphiQL](https://www.electronjs.org/apps/graphiql), a lightweight electron app, to interface with their API due to its relative ease of use. Configuration instructions are [here](https://developers.cloudflare.com/analytics/graphql-api/getting-started). The auth key and email can be found [here](https://github.com/sourcegraph/infrastructure/blob/main/dns/providers.tf). The tool also helps enumerate the available parameters, and is quite useful for exploring the API. -### Available data - The Cloudflare API mainly contains network layer information about communications to and from the service. The entire list of datasets is enumerated [here](https://developers.cloudflare.com/analytics/graphql-api/features/data-sets). For an example, the number of requests and page views per minute, along with the number of unique accessors can be found with the following query. Note that the results are ordered by `datetimeMinute_ASC`, since the default response ordering does not rely on time. ```{ @@ -91,3 +71,10 @@ viewer { } } ``` + +### Cloudflare logs in Elasticsearch + +Cloudflare logs are streamed to an Elasticsearch deployment managed by the Security team. +Reach out to #discuss-security to provision access. + +See [the Cloudflare logs reference](https://developers.cloudflare.com/logs/reference/) and related pages for documentation on various fields. diff --git a/content/departments/engineering/dev/tools/observability/index.md b/content/departments/engineering/dev/tools/observability/index.md index c530b94befc8..98fee93a952f 100644 --- a/content/departments/engineering/dev/tools/observability/index.md +++ b/content/departments/engineering/dev/tools/observability/index.md @@ -10,7 +10,9 @@ For general observability development, please refer to the [observability develo - [Sourcegraph monitoring guide](monitoring.md) - [Monitoring pillars](monitoring_pillars.md) - [Monitoring architecture](./monitoring_architecture.md) +- **Managed Services** (e.g. accounts.sourcegraph.com, telemetry-gateway.sourcegraph.com, etc.): refer to [Managed Services infrastructure (go/msp-ops)](../../../managed-services/index.md) +- **Cody Gateway**: refer to [Cody Gateway (go/cody-gateway)](../../../teams/cody/cody-gateway/index.md) -### Learning more +## Learning more Are you interested in observability? Check out the [recommended learning resources](learning_resources.md) to pick up what modern observability is and its benefits. diff --git a/content/departments/engineering/dev/tools/observability/monitoring.md b/content/departments/engineering/dev/tools/observability/monitoring.md index 4ea9d3d3d16c..e698bbde8b6c 100644 --- a/content/departments/engineering/dev/tools/observability/monitoring.md +++ b/content/departments/engineering/dev/tools/observability/monitoring.md @@ -94,7 +94,7 @@ To learn more, reference the [dashboard generator documentation](https://github. Once the dashboard is ready to be shipped to customers, we will need to port it to the [monitoring generator](https://docs.sourcegraph.com/dev/background-information/observability/monitoring-generator) to be included in our next Sourcegraph release. Custom dashboards cannot be added to the `sourcegraph/grafana` except through the generator. -You can use a [local Grafana](#connecting-grafana-to-a-remote-prometheus-instance) or the Cloud Grafana to create a new dashboard and once its ready, export it by following these steps: +You can use a [local Grafana](https://sourcegraph.com/docs/dev/how-to/monitoring_local_dev#grafana) or the Cloud Grafana to create a new dashboard and once its ready, export it by following these steps: - Open "Dashboard Settings" (top right cog). - Select "JSON Model".