Skip to content
This repository has been archived by the owner on Aug 14, 2024. It is now read-only.

Commit

Permalink
feat(sdk): Add backpressure page (#1225)
Browse files Browse the repository at this point in the history
  • Loading branch information
sl0thentr0py authored Apr 29, 2024
1 parent 3402f82 commit 8248225
Show file tree
Hide file tree
Showing 5 changed files with 73 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/components/sidebar.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,9 @@ export default () => {
<SidebarLink to="/sdk/performance/modules/">
Modules
</SidebarLink>
<SidebarLink to="/sdk/performance/backpressure/">
Backpressure Management
</SidebarLink>
</SidebarLink>
<SidebarLink to="/sdk/research/performance">
Research: Performance Monitoring API
Expand Down
7 changes: 7 additions & 0 deletions src/docs/sdk/features.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,13 @@ Respect Sentry’s HTTP 429 `Retry-After` header, or, if the SDK supports multip

See <Link to="/sdk/rate-limiting">Rate Limiting</Link> for details.


## Backpressure Management

Backend SDKs (typically used in server applications) should have backpressure management logic that dynamically downsamples transactions when the throughput in the system is too high.

See <Link to="/sdk/performance/backpressure">Backpressure Management</Link> for details.

## In-App frames

Stack parsing can tell which frames should be identified as part of the user’s application (as opposed to part of the language, a library, or a framework), either automatically or by user configuration at startup, often declared as a package/module prefix.
Expand Down
59 changes: 59 additions & 0 deletions src/docs/sdk/performance/backpressure.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: "Backpressure Management"
---

Backend SDKs that are typically used in server environments are expected to implement a component for backpressure management.

This component will periodically introspect the SDK for measures of throughput and if too high, will dynamically downsample transactions by halving the sample rate temporarily.
Once the system has recovered to a healthy state, the SDK will revert to the sample rate set by the user.

![Backpressure](backpressure.png)

## Configuration

The SDK should expose a boolean config parameter called `enable_backpressure_handling` that controls whether this logic is active or not.

## Design

The backpressure component has two main responsibilities:

* Periodically schedule a [**health check**](#health-check-monitor) in an asynchronous way and update the unhealthy status.
* Use this unhealthy status to dynamically halve the effective sample rate for transactions before making the initial sampling decision.

## Health Check Monitor

### Interval

The health check is typically performed once every **10 seconds** by default. You can expose this interval as a config parameter if you wish on your SDK.

### Conditions

The health check on most SDKs currently tests the following conditions:
* if the background worker queue is full
* any rate limits are currently active

You can add more conditions of **high throughput** or **wasted work** if available and easily measurable on your platform.

### Implementation

The monitor should act asynchronously.
This can be a new thread if supported by the language or a `setTimeout` in languages without threads like NodeJs.

See the [Python implementation](https://github.com/getsentry/sentry-python/blob/d9d87998029fb0ef2bfe933cea0b69bfee60ed51/sentry_sdk/monitor.py#L16-L123) as a reference.

## Downsampling

The monitor should update its internal health status and expose a `downsample_factor` which doubles every 10 seconds till the system is unhealthy.
Typically we only double a maximum of 10 times because the number is already too small then.

This creates an exponential backoff behavior and reduces load in the transaction pipeline.

In your SDKs `set_initial_sampling_decision` which is called as part of the `start_transaction` API, you should use this `downsample_factor` right before making the random number based sampling decision.

See the [Python implementation](https://github.com/getsentry/sentry-python/blob/d9d87998029fb0ef2bfe933cea0b69bfee60ed51/sentry_sdk/tracing.py#L888-L889) as a reference.

### Client Report

If possible, in `transaction.finish`, also record a client report with reason `backpressure` instead of `sample_rate` when the transaction is dropped so that we can track these backpressure outcome statistics.

See the [Python implementation](https://github.com/getsentry/sentry-python/blob/d9d87998029fb0ef2bfe933cea0b69bfee60ed51/sentry_sdk/tracing.py#L705-L711) as a reference.
Binary file added src/docs/sdk/performance/backpressure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions src/docs/sdk/performance/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,10 @@ Depending on the platform, other default data may be included. (For example, for

A transaction's sampling decision should be passed to all of its children, including across service boundaries. This can be accomplished in the `startChild` method for same-service children and using the `senry-trace` header for children in a different service.

### Backpressure

If the SDK supports backpressure handling, the overall sampling rate needs to be divided by the `downsamplingFactor` from the backpressure monitor. See [the backpressure spec](/sdk/performance/backpressure/#downsampling) for more details.

## Header `sentry-trace`

The header is used for trace propagation. SDKs use the header to continue traces from upstream services (incoming HTTP requests), and to propagate tracing information to downstream services (outgoing HTTP requests).
Expand Down

0 comments on commit 8248225

Please sign in to comment.