From 2ff341e5843d7efa9eab3e473c2e82ad9b4ba280 Mon Sep 17 00:00:00 2001 From: Joe Chen Date: Fri, 19 Jan 2024 09:58:18 -0500 Subject: [PATCH] sams: initial page content --- .../core-services/managed-services/index.md | 2 + .../teams/core-services/sams/index.md | 125 ++++++++++++++++++ 2 files changed, 127 insertions(+) create mode 100644 content/departments/engineering/teams/core-services/sams/index.md diff --git a/content/departments/engineering/teams/core-services/managed-services/index.md b/content/departments/engineering/teams/core-services/managed-services/index.md index 8e76c9040a86..a89c66c4a8a5 100644 --- a/content/departments/engineering/teams/core-services/managed-services/index.md +++ b/content/departments/engineering/teams/core-services/managed-services/index.md @@ -13,6 +13,8 @@ Guidance for MSP incidents is available in [Managed Services incident response]( ## [Telemetry Gateway](./telemetry-gateway.md) +## [Sourcegraph Accounts Management System](../sams/index.md) + ## Sourcergaph.com Google OIDC The GCP project that manages our [Google OIDC authentication integration](https://console.cloud.google.com/apis/credentials/oauthclient/394401733494-3ekkk0qr3qvg7b3l1imqcgsh3ej710eq.apps.googleusercontent.com?project=sourcegraph-com-ggl-oidc) ("Sign in with Google") for Sourcergraph.com. We put the integration in a standalone project for a dedicated public-facing [OAuth consent screen](https://console.cloud.google.com/apis/credentials/consent?project=sourcegraph-com-ggl-oidc). diff --git a/content/departments/engineering/teams/core-services/sams/index.md b/content/departments/engineering/teams/core-services/sams/index.md new file mode 100644 index 000000000000..4fc3b6261cd3 --- /dev/null +++ b/content/departments/engineering/teams/core-services/sams/index.md @@ -0,0 +1,125 @@ +# Sourcegraph Accounts Managment System (SAMS) + +[Sourcegraph Accounts Managment System (SAMS)](https://docs.google.com/document/d/16F6uvfM9EknpcuAQQ8kIPOZ9gHo0Lx4lgprw_5sWJEs/edit) is the centralized accounts system for all of the Sourcegraph-operated systems, it provides: + +- Single Sign-On (SSO) experience for users of those systems, and cross-system referenceable user ID. +- Out-of-the-box machine-to-machine authentication and authorization capabilities. + +It is compliant with [OAuth 2](https://oauth.net/2/) and [OIDC](https://openid.net/) protocols but only exposes a subset of the full capabilities for security reasons. In particular, only the following flows are allowed: + +- [Authorization Code Flow](https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow) +- [Refresh Token Flow](https://cloudentity.com/developers/basics/oauth-grant-types/refresh-token-flow/) +- [Client Credentials Flow](https://auth0.com/docs/get-started/authentication-and-authorization-flow/client-credentials-flow) + +The [OpenID Discovery](https://accounts.sourcegraph.com/.well-known/openid-configuration) endpoint lays out all the protocol details that a Relay Party / Service Provider needs to know to integrate with SAMS. + +## System characteristics + +TODO: access token expiry + +## Service images + +Images are published to a private image repository, [`us-central1-docker.pkg.dev/sourcegraph-dev/sams/accounts-server`](https://console.cloud.google.com/artifacts/docker/sourcegraph-dev/us-central1/sams/accounts-server?project=sourcegraph-dev), on every commit in `main` using the `insiders` tag. To pull down the published images locally, you need to [request access via Entitle](https://app.entitle.io/request?data=eyJkdXJhdGlvbiI6IjEwODAwIiwianVzdGlmaWNhdGlvbiI6IlB1bGwgZG93biBkZXYgaW1hZ2VzIiwicm9sZUlkcyI6W3siaWQiOiJhM2ZmNTQ1ZC0zZGVmLTQxY2ItYjJiNy1lMTM2MDM5Y2YwZGYiLCJ0aHJvdWdoIjoiYTNmZjU0NWQtM2RlZi00MWNiLWIyYjctZTEzNjAzOWNmMGRmIiwidHlwZSI6InJvbGUifV19). + +Publishing resources are [provisioned in `sourcegraph/infrastructure`](https://github.com/sourcegraph/infrastructure/tree/main/managed-services/sams-publishing-pipeline). + +## Deployments + +SAMS runs on [Managed Services Platform](../managed-services/index.md), deployment resources are [provisioned in `sourcegraph/managed-services`](https://github.com/sourcegraph/managed-services/tree/main/services/sams). + +- Testing: https://accounts.sgdev.org +- Production: https://accounts.sourcegraph.com + +## Operations + +> [!NOTE] +> To get access to most resources, you’ll need to [request infrastructure access](#infrastructure-access). + +Here is a list of useful quick links: + +- [Terraform Cloud workspaces](https://app.terraform.io/app/sourcegraph/workspaces?project=prj-7gzvzKCGcKupiA4s) +- [Cloud Run service (metrics overview)](https://console.cloud.google.com/run/detail/us-central1/pings/metrics?project=pings-prod-2f4f73edf1db) +- [Service logs](https://cloudlogging.app.goo.gl/JMmBSAbEceh6onpj8) +- [GCP alerts](https://console.cloud.google.com/monitoring/alerting?project=pings-prod-2f4f73edf1db) +- [GCP errors](https://console.cloud.google.com/errors?project=pings-prod-2f4f73edf1db) +- [GCP Cloud Profiler](https://console.cloud.google.com/profiler/pings?project=pings-prod-2f4f73edf1db) + +For other infrastructure operations, see [Sourcegraph Accounts infrastructure operations](../../../managed-services/sams.md). + +### Infrastructure access + +The following Entitle requests are needed to get access to Pings service infrastructure: + +- [GCP Project - MSP Service Editor](https://app.entitle.io/request?targetType=resource&duration=43200&justification=TODO&integrationId=134476cb-0bd6-4c6d-a89f-e1550988bdd7&resourceId=d94da8c3-76eb-451a-9cbb-973ac3bc44b1&roleId=8b60a711-976c-4e56-9f8b-cb2c989faca4&grantMethodId=8b60a711-976c-4e56-9f8b-cb2c989faca4) (see [MSP Entitle](./platform.md#entitle)) + +### Deployment + +The Pings service infrastructure is defined in [`sourcegraph/managed-services/services/pings`](https://github.com/sourcegraph/managed-services/tree/main/services/pings) utilizing [Managed Services Platform](./platform.md). + +#### Modify deployment manifest + +> [!WARNING] +> Due to the early-stage shape of [Managed Services Platform](./platform.md), we have yet to roll out standardized playbook. Please reach out to #team-core-services for modifying the deployment manifest. Instructions in this section are generally assumed with an upfront setup. + +To modify the deployment manifest: + +1. Update `service.yaml` file +1. In the repository root, run `sg msp generate services/pings/service.yaml prod` +1. Stage changes and make a pull request +1. The Terraform Cloud rolls out changes + +#### Use a different image tag + +To specify a Docker image tag other than the default, update the `service.yaml`: + +```diff + - id: prod + ... + deploy: + type: manual ++ manual: ++ tag: 218287_2023-05-10_5.0-5bd03cd18e71 +``` + +#### Re-deploy the same manifest + +Go to the ["Deploy revision" page](https://console.cloud.google.com/run/deploy/us-central1/pings?project=pings-prod-2f4f73edf1db) of the Cloud Run service and click **DEPLOY** (bottom of the page) without changing any configuration. This will also happen whenever a Terraform change happens to the "cloudrun" stack. + +#### Update the service resource allocations + +The following section in the [`service.yaml`](https://github.com/sourcegraph/managed-services/blob/main/services/pings/service.yaml#L29-L35) defines the resource allocation for the Cloud Run service instances: + +```yaml +environments: + - id: prod + instances: + resources: + cpu: 1 # Per-instance CPU + memory: 1Gi # Per-insatnce memory + scaling: + maxCount: 3 # Maximum count of instances + minCount: 1 # Minimum count of instances, setting to 1 can avoid cold start +``` + +Once updated, follow the [Modify deployment manifest](#modify-deployment-manifest) to apply the changes. + +### Observability + +> [!NOTE] +> To get access to most resources, you’ll need to [request infrastructure access](#infrastructure-access). + +#### Alerting + +- [GCP Monitoring Alerting](https://console.cloud.google.com/monitoring/alerting?project=pings-prod-2f4f73edf1db) + +All alerts from all environments currently go to #alerts-pings-sourcegraph-com. + +#### Metrics + +The deployment's [Cloud Run metrics overview page](https://console.cloud.google.com/run/detail/us-central1/pings/metrics?project=pings-prod-2f4f73edf1db) provides basic observability into the service provided out-of-the-box by Cloud Run, such as instance count and resource utilization. + +Pings service also pushes [GCP Custom Metrics](https://console.cloud.google.com/monitoring/dashboards/builder/eda5de3e-2bd2-41ad-afe2-7c7dfaeeebba?project=pings-prod-2f4f73edf1db) via OpenTelemetry metrics. + +## Development + +The source code and CI are located in the [sourcegraph/sams](https://github.com/sourcegraph/sams) GitHub repository.