Skip to content

Commit

Permalink
Add examples with images
Browse files Browse the repository at this point in the history
  • Loading branch information
estherk15 committed Dec 13, 2023
1 parent 62b88dc commit 36c20d3
Show file tree
Hide file tree
Showing 5 changed files with 13 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ When creating SLOs, you can choose from the following types:
- **Monitor-based SLOs**: can be be used for time-based data sets, the SLI is based on the amount of time your system exhibits good behavior divided by the total time. Monitor-based SLOs must be based on a new or existing Datadog monitor, any adjustments must be made to the underlying monitor (cannot be done through SLO creation).
- **Time Slice SLOs**: can be be used for time-based data sets, the SLI is based on the amount of time your system exhibits good behavior divided by the total time. Time Slice SLOs do not require a Datadog monitor, you can try out different metric filters and thresholds and instantly explore downtime during SLO creation.

For a full comparison, see the [SLO Type Comparison][1] chart.

## Setup

Use Datadog's [Service Level Objectives status page][2] to create new SLOs or to view and manage all your existing SLOs.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ further_reading:
- link: "/service_management/service_level_objectives/monitor/"
tag: "Documentation"
text: "Monitor-based SLOs"
- link: "/service_management/service_level_objectives/time_slice/"
tag: "Documentation"
text: "Time Slice SLOs"
---

## Overview
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,20 @@ To calculate the uptime percentage for a Time Slice SLOs, Datadog cuts the times

For each slice, there is a single value for the timeseries, and the uptime condition (such as `value < 1`) is evaluated for each slice. If the condition is met, the slice is considered uptime, otherwise it is considered downtime.

{{< img src="service_management/service_level_objectives/time_slice/uptime_latency.png" alt="Time Slice SLO detail panel showing application latency with one uptime violation" style="width:100%;" >}}

For the above example, exactly one point in the timeseries violates the uptime condition (in this case, the condition is that the p95 latency is less than or equal to 2.5 seconds). Since the total time period shown is 12 hours (720 minutes), and 715 minutes are considered uptime (720 min total time - 5 min downtime), the uptime percentage is 715/720 * 100 = 99.305%

### Groups and overall uptime

Time Slice SLOs allow you to track uptime for individual groups, where groups are defined in the "group by" portion of the metric query.

When groups are present, uptime is calculated for each individual group. However, overall uptime works differently. In order to match existing monitor SLO functionality, Time Slice SLOs use the same definition of overall uptime. When **all** groups have uptime, it is considered overall uptime. Conversely, if **any** group has downtime, it is considered overall downtime. Overall uptime is always less than the uptime for any individual group.

{{< img src="service_management/service_level_objectives/time_slice/uptime_latency_groups.png" alt="Time Slice SLO detail panel of application latency uptime with groups" style="width:100%;" >}}

In the example above, environment "prod" has 5 minutes of downtime over a 12 hour (720 minute) period, resulting in approximately 715/720 * 100 = 99.305% of uptime. Environment "dev" also had 5 minutes of downtime, resulting in the same uptime. This means that overall downtime--when either datacenter prod or dev had downtime--was 10 minutes (since there is no overlap), resulting in approximately (720-10)/720 * 100 = 98.611% uptime.

### Corrections

Time Slice SLOs count correction periods as uptime in all calculations. Since the total time remains constant, the error budget is always a fixed amount of time as well. This is a significant simplification and improvement over how corrections are handled for monitor-based SLOs.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 36c20d3

Please sign in to comment.