Skip to content

Commit

Permalink
transform SLO into SLI (#292)
Browse files Browse the repository at this point in the history
  • Loading branch information
tobru authored Nov 10, 2023
1 parent 592876b commit 1b0cba9
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 15 deletions.
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
= Service Level Objectives
= Service Level Indicator (SLI)
:page-aliases: explanations/slos.adoc

APPUiO Managed OpenShift 4 comes with a collection of https://sre.google/sre-book/service-level-objectives/[service level objectives (SLOs)].
This document defines and explains these SLOs.
An APPUiO Managed cluster should meet these objectives to provide the expected service level to our customers.
APPUiO Managed OpenShift comes with a collection of https://sre.google/sre-book/service-level-objectives/[service level indicators (SLIs)].
This document defines and explains these SLIs.
All of the SLIs are in the scope of the https://products.vshn.ch/service_levels.html["Guaranteed Availability" Service Level].

We use the SLOs and https://sre.google/workbook/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts[Multiwindow, Mulit-Brun-Rate Alerts] as the basis of our on-call alerting.
We use the SLIs and https://sre.google/workbook/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts[Multiwindow, Mulit-Burn-Rate Alerts] as the basis of our on-call alerting.

IMPORTANT: These are internal service level *objectives*, not service level *agreements*.
We don't guarantee to meet these objectives at all times.

== Ingress

Expand All @@ -17,7 +16,7 @@ If the workloads running on the cluster aren't accessible, it might as well be d
=== Canary

****
*99.75% of all HTTP probes to a canary application succeed*
*HTTP probes to a canary application*
****

Probes are sent every minute from the ingress operator, inside the cluster, to the external address of the canary target.
Expand Down Expand Up @@ -71,11 +70,10 @@ If the API isn't available, users can't change configuration or run new workload

A misbehaving Kubernetes API directly impacts the service level.


=== Request Error Rate

****
*99.9% of all requests to the Kubernetes API server succeed or are invalid*
*Requests to the Kubernetes API server succeed or are invalid*
****

This is measured directly at the API server through the following metrics.
Expand All @@ -95,7 +93,7 @@ NOTE: We only look for HTTP 5xx errors, which indicate a server side error, and
=== Uptime

****
*99.9% of all HTTP probes to the Kubernetes API server succeed*
*HTTP probes to the Kubernetes API server succeed*
****

Probes are sent every 10 seconds from a blackbox exporter inside the cluster to the readiness endpoint of the Kubernetes API server.
Expand All @@ -112,7 +110,7 @@ This ability is essential and directly impacts the service level.
=== Canary

****
*99.75% of canary pods start successfully*
*Canary pods start successfully*
****

A controller starts a known good canary pod every minute and checks if it successfully started after 3 minutes.
Expand All @@ -128,7 +126,7 @@ Any storage issues directly impacts the service level for users.
=== CSI Operations

****
*99.5% of all CSI operations complete successfully*
*CSI operations complete successfully*
****

CSI operations are any interactions of the kubelet or controller-manager with the CSI provider.
Expand Down Expand Up @@ -159,7 +157,7 @@ Without it, users can't reliably access their workload and even moderate packet
=== Packet Loss

****
*99.5% of all ICMP pings between canary pods succeed*
*ICMP pings between canary pods succeed*
****

A network canary daemonset starts a canary pod on every node.
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/ROOT/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@
* Monitoring
** xref:oc4:ROOT:explanations/cluster_monitoring.adoc[]
** xref:oc4:ROOT:explanations/slos.adoc[]
** xref:oc4:ROOT:explanations/slis.adoc[]
** xref:oc4:ROOT:how-tos/monitoring/global-monitoring.adoc[]
** xref:oc4:ROOT:how-tos/monitoring/handle_alerts.adoc[]
** xref:oc4:ROOT:how-tos/monitoring/remove_rules.adoc[]
Expand Down

0 comments on commit 1b0cba9

Please sign in to comment.