Skip to content

Commit 3873365

Browse files
author
gathomas
committed
wip
1 parent 5031b26 commit 3873365

33 files changed

+361
-198
lines changed

README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
# Managed Tenants SRE Docs
22

3-
Documentation for Managed Tenants SRE at Red Hat
3+
Documentation for Managed Tenants SRE at Red Hat.
4+
5+
## Run Locally
6+
7+
`hugo server`

content/en/_index.md

+2-5
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,12 @@ weight: 10
2626
class="mt-5 animate__animated animate__slideInLeft animate__fast animate__delay-1s">
2727
<h1>Developers</h1>
2828
<p>
29-
Automation Tooling, CI/CD, SDKs, reference and example projects, guides and API documentation for developers working on managed services and their dependencies.
29+
Addons, Automation Tooling, CI/CD, SDKs, reference and example projects, guides and API documentation for developers working on managed services and their dependencies.
3030
</p>
31+
<a class="btn btn-lg btn-primary mr-3 mb-4" href="{{< relref "/docs/creating-addons" >}}">Get Started</a>
3132
<a class="btn btn-lg btn-outline-light mr-3 mb-4" href="https://sdk.operatorframework.io/docs/">Operator SDK</a>
3233
<a class="btn btn-lg btn-outline-light mr-3 mb-4" href="https://source.redhat.com/groups/public/openshiftplatformsre">Platform SRE</a>
3334
</div>
3435

35-
<div class="mt-5 animate__animated animate__slideInLeft animate__fast animate__delay-2s">
36-
<h1>Vendor Integrators</h1>
37-
<p>Integrating and offering Red Hat external services within managed OpenShift.</p>
38-
</div>
3936
</div>
4037
{{< /blocks/section >}}

content/en/docs/addons-flow/architecture.md

+9-13
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,6 @@ title: "Addons Flow Architecture"
33
date: 2022-12-15T00:53:51+01:00
44
---
55

6-
# Add-Ons Flow Architecture
7-
86
Add-Ons are Operators. As such, Add-Ons are installed using typical Operator
97
objects, like `Subscription`, `OperatorGroup` and `CatalogSource`.
108

@@ -29,12 +27,12 @@ the corresponding bundles directories (`managed-tenants-bundles/addons/<addon_na
2927
The Managed Tenants CI is in charge of processing the input and deploying all
3028
the artifacts. This image shows the data flows:
3129

32-
![Data Flows](../images/addon-syncset-installation.png)
30+
![Data Flows](/addon-syncset-installation.png)
3331

3432
With that in place, OCM will present an "Add-Ons" tab, listing all the Add-Ons
3533
that your organization has quota for. Example:
3634

37-
![Data Flows](../images/architecture_ocm_ui.png)
35+
![Data Flows](/architecture_ocm_ui.png)
3836

3937
## Installation
4038

@@ -43,34 +41,32 @@ When you click "Install" in the OCM Web UI, under the hood, OCM creates a
4341
in Hive. The `SyncSet` object references the cluster in which the addon was just installed in the
4442
`clusterDeploymentRefs` field.
4543

46-
// TODO: If there is already a SyncSet for an addon is a new one created or is the new cluster name
47-
just appended to the clusterDeploymentRefs list?
48-
49-
![Data Flows](../images/architecture_install_flow.png)
44+
![Data Flows](/architecture_install_flow.png)
5045
[excalidraw](https://excalidraw.com/#room=71d4b0273d4404dbbebf,Pk5KFYj9fFvXSvObY6juCA)
5146

5247
From there, OLM will take over, installing the Operator in the OpenShift
5348
cluster. While OLM is installing the Operator, OCM will keep polling the
5449
telemetry data reported by the cluster, waiting for the `csv_succeeded=1`
5550
metric from that Operator:
5651

57-
![Data Flows](../images/architecture_telemetry_wait.png)
52+
![Data Flows](/architecture_telemetry_wait.png)
5853

5954
At some point, when the Operator is fully installed, OCM will reflect that in
6055
the Web UI:
6156

62-
![Data Flows](../images/architecture_telemetry_done.png)
57+
![Data Flows](/architecture_telemetry_done.png)
6358

6459
## Addon Status Lifecycle
6560

66-
![Addon Status Lifecycle](../images/addon-status-ocm.png)
61+
![Addon Status Lifecycle](/addon-status-ocm.png)
6762

6863
## Deprecated SelectorSyncSet Installation
69-
![Data Flows](../images/architecture_data_flow.png)
64+
65+
![Data Flows](/architecture_data_flow.png)
7066
When you click "Install" in the OCM Web UI, under the hood, OCM will apply
7167
a label to the corresponding `ClusterDeployment` object in Hive.
7268

7369
That label is the same label used in the `SelectorSyncSet` as a `matchLabel`.
7470

7571
With the `ClusterDeployment` label now matching the `SelectorSyncSet` label,
76-
Hive will apply all the objects in the `SelectorSyncSet` to the target cluster:
72+
Hive will apply all the objects in the `SelectorSyncSet` to the target cluster:

content/en/docs/creating-addons/monitoring/deadmanssnitch_integration.md

+16-13
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,24 @@ date: 2022-12-15T00:53:51+01:00
55
---
66

77
## Overview
8+
89
[Dead Man's Snitch (DMS)](https://deadmanssnitch.com/) is essentially a constantly firing prometheus alert and an external receiver
910
(called a snitch) that will alert should the monitoring stack go down and stop sending alerts.
1011
The generation of "snitch" urls is dynamic on cluster/add-on installed.
1112
It's done via the [DMS operator](https://github.com/openshift/deadmanssnitch-operator),
1213
which runs on hive and is owned by SREP.
1314

1415
## Usage
16+
1517
The Add-On metadata file (`addon.yaml`) allows you to provide a `deadmanssnitch` field (see `deadmansnitch` field in
1618
the Add-On metadata file [schema documentation](https://github.com/mt-sre/managed-tenants-cli/blob/main/docs/tenants/zz_metadata_schema_generated.md)
1719
for more information).
1820
This field allows you to provide the required Dead Man's Snitch integration configuration. A `DeadmansSnitchIntegration`
1921
resource is then created and applied to Hive alongside the Add-On [SelectorSyncSet (SSS)](https://github.com/openshift/hive/blob/master/docs/syncset.md#selectorsyncset-object-definition).
2022

21-
2223
### DeadmansSnitchIntegration Resource
23-
The default DMS configurations which will be created if you specify the bare minimum fields under 'deadmanssnitch'
24+
25+
The default DMS configurations which will be created if you specify the bare minimum fields under 'deadmanssnitch'
2426
field in addon metadata:
2527

2628
```yaml
@@ -30,7 +32,7 @@ field in addon metadata:
3032
name: addon-{{ADDON.metadata['id']}}
3133
namespace: deadmanssnitch-operator
3234
spec:
33-
clusterDeploymentSelector: ## can be overriden by .deadmanssnitch.clusterDeploymentSelector field in addon metadata
35+
clusterDeploymentSelector: ## can be overridden by .deadmanssnitch.clusterDeploymentSelector field in addon metadata
3436
matchExpressions:
3537
- key: {{ADDON.metadata['label']}}
3638
operator: In
@@ -41,13 +43,13 @@ field in addon metadata:
4143
name: deadmanssnitch-api-key
4244
namespace: deadmanssnitch-operator
4345

44-
snitchNamePostFix: {{ADDON.metadata['id']}} ## can be overriden by .deadmanssnitch.snitchNamePostFix field in addon metadata
46+
snitchNamePostFix: {{ADDON.metadata['id']}} ## can be overridden by .deadmanssnitch.snitchNamePostFix field in addon metadata
4547

4648
tags: {{ADDON.metadata['deadmanssnitch']['tags']}} ## Required
4749

4850
targetSecretRef:
49-
name: {{ADDON.metadata['id']}}-deadmanssnitch ## can be overriden by .deadmanssnitch.targetSecretRef.name field in addon metadata
50-
namespace: {{ADDON.metadata['targetNamespace']}} ## can be overriden by .deadmanssnitch.targetSecretRef.namespace field in addon metadata
51+
name: {{ADDON.metadata['id']}}-deadmanssnitch ## can be overridden by .deadmanssnitch.targetSecretRef.name field in addon metadata
52+
namespace: {{ADDON.metadata['targetNamespace']}} ## can be overridden by .deadmanssnitch.targetSecretRef.namespace field in addon metadata
5153
```
5254
5355
### Examples of `deadmanssnitch` field in `addon.yaml`
@@ -96,9 +98,10 @@ deadmanssnitch:
9698
namespace: redhat-rhoami-operator
9799
```
98100

99-
### Generated Secret
100-
A secrete will be generated (by default in the same namespace as your addon) with the `SNITCH_URL`.
101-
Your add-on will need to pick up the generated secret in cluster and inject it into your alertmanager config.
101+
### Generated Secret
102+
103+
A secrete will be generated (by default in the same namespace as your addon) with the `SNITCH_URL`.
104+
Your add-on will need to pick up the generated secret in cluster and inject it into your alertmanager config.
102105
Example of in-cluster created secret:
103106

104107
```yaml
@@ -114,6 +117,7 @@ type: Opaque
114117
```
115118

116119
### Alert
120+
117121
Your alertmanager will need a constantly firing alert that is routed to DMS:
118122
Example of an alert that always fires:
119123

@@ -131,6 +135,7 @@ Example of an alert that always fires:
131135
```
132136

133137
## Route
138+
134139
Example of a route that forwards the firing-alert to DMS:
135140

136141
```yaml
@@ -141,6 +146,7 @@ Example of a route that forwards the firing-alert to DMS:
141146
```
142147

143148
## Receiver
149+
144150
Example receiver for DMS:
145151

146152
```yaml
@@ -155,13 +161,10 @@ Before going live with the SRES SRE team, they will need to manually point the `
155161
in the Service Delivery DMS account to their pagerduty service.
156162
{{% /alert %}}
157163

158-
159164
Please log a JIRA with your assigned SRE team to have this completed at least one week before going live with the SRE team.
160165

161-
162-
163166
### Current Example
167+
164168
- [RHOAM addon: DMS CR template](https://gitlab.cee.redhat.com/service/managed-tenants/-/blob/09cf5112e7dc5588c14f158d6490f7f1e7051c6a/addons/managed-api-service-internal/metadata/production/deadmanssnitch.yaml.j2)
165169
- [RHOAM addon: extraResources](https://gitlab.cee.redhat.com/service/managed-tenants/-/blob/09cf5112e7dc5588c14f158d6490f7f1e7051c6a/addons/managed-api-service-internal/metadata/production/addon.yaml#L40) field in `addon.yaml`
166170
- [RHODS addon: alertmanager configuration](https://github.com/red-hat-data-services/odh-deployer/blob/cb48c55725fd32fdc89a5ff29517b3f4cc0d1f54/monitoring/prometheus/prometheus.yaml)
167-
// TODO: This

content/en/docs/creating-addons/monitoring/slo_dashboards.md

+11-14
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,15 @@ title: "SLO Dashboards"
33
date: 2022-12-15T00:53:51+01:00
44
---
55

6-
// TODO: This might go in a SLI/SLO directory
7-
86
Development teams are requested to co-maintain, with the MT-SRE Team, SLO Dashboards for the Addons
9-
they develop. This document explains how to boostrap the dashboard creation and deployment.
10-
11-
This document does not discuss the content of the dashboard, the topic will be approached in a
12-
follow-up documentation artifact. // TODO: WHERE???
7+
they develop. This document explains how to bootstrap the dashboard creation and deployment.
138

149
## First Dashboard
1510

1611
* Fork/clone the [managed-tenants-slos repository](https://gitlab.cee.redhat.com/service/managed-tenants-slos).
1712
* Create the following directory structure:
1813

19-
```
14+
```yaml
2015
├── <addon-name>
2116
│ ├── dashboards
2217
│ │ └── <addon-name>-slo-dashboard.configmap.yaml
@@ -185,7 +180,7 @@ data:
185180

186181
## Dashboard Deployment
187182

188-
Merging of the above merge request is a prerequisite for this step.
183+
Merging of the above merge request is a prerequisite for this step.
189184

190185
The dashboard deployment happens through app-interface, using saas-files.
191186

@@ -194,40 +189,42 @@ The dashboard deployment happens through app-interface, using saas-files.
194189

195190
Example Merge Request content to app-interface:
196191

197-
https://gitlab.cee.redhat.com/service/app-interface/-/commit/9306800aabaca18cd034dfb3933a12d29506fa08
192+
<https://gitlab.cee.redhat.com/service/app-interface/-/commit/9306800aabaca18cd034dfb3933a12d29506fa08>
198193

199194
* Ping `@mt-sre-ic` in the `#forum-managed-tenants` Slack channel for approval.
200195
* Merge Requests to app-interface are constantly reviewed/merged by AppSRE. After the MT-SRE approval,
201196
wait until the Merge Request is merged.
202197

203-
# Accessing the Dashboards
198+
## Accessing the Dashboards
204199

205200
Once the app-interface merge request is merged, you will see your ConfigMaps being deployed in
206201
the `#sd-mt-sre-info` Slack channel. For example:
207202

208-
```
203+
```bash
209204
[app-sre-stage-01] ConfigMap odf-ms-cluster-status applied
210205
...
211206
[app-sre-prod-01] ConfigMap odf-ms-cluster-status applied
212207
```
213208

214209
Once the dashboards are deployed, you can see them here:
215210

216-
* STAGE: https://grafana.stage.devshift.net/dashboards/f/aGqy3WB7k/addons
217-
* PRODUCTION: https://grafana.app-sre.devshift.net/dashboards/f/sDiLLtgVz/addons
211+
* STAGE: <https://grafana.stage.devshift.net/dashboards/f/aGqy3WB7k/addons>
212+
* PRODUCTION: <https://grafana.app-sre.devshift.net/dashboards/f/sDiLLtgVz/addons>
218213

219-
# Development Flow
214+
## Development Flow
220215

221216
After all the configuration is in place:
222217

223218
STAGE:
219+
224220
* Dashboards on the STAGE Grafana instance should not be used by external audiences other than
225221
the people developing the dashboards.
226222
* Changes in the `managed-tenants-slos` repository can be merged by the development team with "/lgtm"
227223
comments from those in the OWNERS file.
228224
* After merged, changes are automatically delivered to the STAGE grafana instance.
229225

230226
PRODUCTION:
227+
231228
* The dashboards on the PRODUCTION Grafana are pinpointed to a specific git commit from the managed-tenants-slos
232229
repository in the corresponding saas-file in app-interface.
233230
* After patching the git commit in the saas-file, owners of the saas-file can merge the promotion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
---
2+
title: Testing With OSD-E2E
3+
linkTitle: Testing With OSD-E2E
4+
---
5+
16
## Testing With OSD-E2E
27

38
All Add-Ons must have a reference to a test harness container in a publicly
@@ -7,23 +12,26 @@ test harness image. That image is generated by the OSD e2e process.
712
The test harness will be tested against OCP nightly and OSD next.
813

914
Please refer to the
10-
[OSD-E2E Add-On Documentation](https://github.com/openshift/osde2e/blob/master/docs/Addons.md)
15+
[OSD-E2E Add-On Documentation](https://github.com/openshift/osde2e-example-test-harness/blob/main/README.md
1116
for more details on how this test harness will be run and how it is expected to
1217
report results.
1318

1419
## Primer into OSD E2E tests and prow jobs
15-
To ensure certain things such as validating that the addon can be easily and successfully installed on a customer’s cluster,
20+
21+
To ensure certain things such as validating that the addon can be easily and successfully installed on a customer’s cluster,
1622
we have prow jobs setup which run e2e tests (one test suite per addon) every 12 hours.
1723
If the e2e tests corresponding to any addon fail, then automated alerts/notifications are sent to the addon team.
18-
Every addon's e2e tests are packaged in an image called “testHarness”, which is built and pushed to [quay.io](https://quay.io)
24+
Every addon's e2e tests are packaged in an image called “testHarness”, which is built and pushed to [quay.io](https://quay.io)
1925
by the team maintaining the addon.
20-
Once the "testHarness" image is built and pushed, the team must register their addon to testHarness image’s e2e tests
21-
by making a PR against this file as it holds the configuration about all the registered e2e tests. For example, this section. ??
26+
Once the "testHarness" image is built and pushed, the team must register their addon to testHarness image’s e2e tests
27+
by making a PR against [this file](https://github.com/openshift/release/blob/master/ci-operator/jobs/openshift/osde2e/openshift-osde2e-main-periodics.yaml).
2228

23-
You can access the portal for prow jobs [here](https://prow.ci.openshift.org). The prow jobs follow the below steps to
29+
You can access the portal for prow jobs [here](https://prow.ci.openshift.org). The prow jobs follow the below steps to
2430
run the e2e tests. For every e2e test defined inside [this file](https://github.com/openshift/release/blob/master/ci-operator/jobs/openshift/osde2e/openshift-osde2e-main-periodics.yaml):
25-
* An OSD cluster is created and the addon, which is being tested, is installed. Openshift API is used to perform these operations via the API definition provided at https://api.openshift.com // TODO: Last sentence not needed?
26-
* The e2e prow job definition, specifically for the addon from [this file](https://github.com/openshift/release/blob/master/ci-operator/jobs/openshift/osde2e/openshift-osde2e-main-periodics.yaml), is parsed and hence, the parameters required to run its e2e tests will be recognized as well.
27-
* The "testHarness" image for the addon is parsed and executed against the parameters fetched from the above step.
28-
* If an MT-SRE team member notices those tests failing, they should notify the respective team to take a look at them and fix them.
2931

32+
* An OSD cluster is created and the addon, which is being tested, is installed. Openshift API is used to perform these
33+
operations via the API definition provided at <https://api.openshift.com>
34+
* The e2e prow job definition, specifically for the addon from [this file](https://github.com/openshift/release/blob/master/ci-operator/jobs/openshift/osde2e/openshift-osde2e-main-periodics.yaml),
35+
is parsed and hence, the parameters required to run its e2e tests will be recognized as well.
36+
* The "testHarness" image for the addon is parsed and executed against the parameters fetched from the above step.
37+
* If an MT-SRE team member notices those tests failing, they should notify the respective team to take a look at them and fix them.

content/en/docs/creating-addons/testing/without-ocm.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
## Testing Addons on OCP
1+
---
2+
title: Testing With OCP (Without OCM)
3+
linkTitle: Testing With OCP (Without OCM)
4+
---
5+
6+
## Testing Without OCM
27

38
During the development process, it might be useful (and cheaper) to run your
49
addon on an OCP cluster.

content/en/docs/creating-addons/top-level-operator/customer-notifications.md

+1-8
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ date: 2022-12-15T00:53:51+01:00
44
---
55

66
There are multiple ways a user or group can get notified of service events (e.g.
7-
planned maintenance, outages). There are two fields in the addon metadata file
7+
planned maintenance, outages). There are two fields in the addon metadata file
88
(see Add-On metadata file [schema documentation](https://github.com/mt-sre/managed-tenants-cli/blob/main/docs/tenants/zz_metadata_schema_generated.md)
99
for more information) where email addresses can be provided:
1010

@@ -14,12 +14,5 @@ for more information) where email addresses can be provided:
1414
* `addonNotifications`: This is a list of additional email addresses of
1515
employees who would like to receive notifications about a service.
1616

17-
1817
There is also a mailing list that receives notifications for all services managed by Service Delivery.
1918
Subscribe to the sd-notifications mailing list [here](https://post-office.corp.redhat.com/mailman/listinfo/sd-notifications).
20-
21-
22-
23-
// TODO: Remove this line?
24-
Find out more about App-SRE Incident Procedures
25-
[here](https://gitlab.cee.redhat.com/service/app-interface/blob/master/docs/app-sre/AAA.md#incident-procedure).

content/en/docs/creating-addons/top-level-operator/dependencies.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,7 @@ as signed-off by the Managed Tenants SRE Team.
99
## Dependencies Specification
1010

1111
* Addons must specify dependencies using the OLM dependencies feature,
12-
documented
13-
[here](https://docs.openshift.com/container-platform/4.7/operators/understanding/olm/olm-understanding-dependency-resolution.html)
12+
documented [here](https://docs.openshift.com/container-platform/4.7/operators/understanding/olm/olm-understanding-dependency-resolution.html)
1413
* The dependencies must have the version pin-pointed. Ranges are not allowed.
1514
* The dependencies must come from a *Trusted Catalog*. See the
1615
[Trusted Catalogs](#trusted-catalogs) section for details.
@@ -45,4 +44,5 @@ implemented by CPaaS, or by the Managed Tenants SRE Team.
4544

4645
There's a feature request to the OLM Team to allow specifying the
4746
CatalogSource used for the dependencies:
47+
4848
* [OLM-2249](https://issues.redhat.com/browse/OLM-2249)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
title: Internal Documentation
3+
linkTitle: Internal Documentation
4+
weight: 10
5+
description: >
6+
Internal Documentation for the SRE teams.
7+
---
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
title: "Getting Access"
3+
linkTitle: "Getting Access"
4+
weight: 20
5+
menu:
6+
main:
7+
weight: 20
8+
---

0 commit comments

Comments
 (0)