Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resiliency policies updates #4490

Merged
merged 10 commits into from
Feb 1, 2025
Original file line number Diff line number Diff line change
@@ -309,6 +309,8 @@ context.AddMetadata("dapr-stream", "true");

### Streaming gRPCs and Resiliency

> Currently, resiliency policies are not supported for service invocation via gRPC.
When proxying streaming gRPCs, due to their long-lived nature, [resiliency]({{< ref "resiliency-overview.md" >}}) policies are applied on the "initial handshake" only. As a consequence:

- If the stream is interrupted after the initial handshake, it will not be automatically re-established by Dapr. Your application will be notified that the stream has ended, and will need to recreate it.
330 changes: 0 additions & 330 deletions daprdocs/content/en/operations/resiliency/policies.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
type: docs
title: "Resiliency policies"
linkTitle: "Policies"
weight: 200
description: "Configure resiliency policies for timeouts, retries, and circuit breakers"
---

Define timeouts, retries, and circuit breaker policies under `policies`. Each policy is given a name so you can refer to them from the [`targets` section in the resiliency spec]({{< ref targets.md >}}).
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
type: docs
title: "Circuit breaker resiliency policies"
linkTitle: "Circuit breakers"
weight: 30
description: "Configure resiliency policies for circuit breakers"
---

Circuit breaker policies are used when other applications/services/components are experiencing elevated failure rates. Circuit breakers reduce load by monitoring the requests and shutting off all traffic to the impacted service when a certain criteria is met.

After a certain number of requests fail, circuit breakers "trip" or open to prevent cascading failures. By doing this, circuit breakers give the service time to recover from their outage instead of flooding it with events.

The circuit breaker can also enter a “half-open” state, allowing partial traffic through to see if the system has healed.

Once requests resume being successful, the circuit breaker gets into "closed" state and allows traffic to completely resume.

## Circuit breaker policy format

```yaml
spec:
policies:
circuitBreakers:
pubsubCB:
maxRequests: 1
interval: 8s
timeout: 45s
trip: consecutiveFailures > 8
```
## Spec metadata
| Retry option | Description |
| ------------ | ----------- |
| `maxRequests` | The maximum number of requests allowed to pass through when the circuit breaker is half-open (recovering from failure). Defaults to `1`. |
| `interval` | The cyclical period of time used by the circuit breaker to clear its internal counts. If set to 0 seconds, this never clears. Defaults to `0s`. |
| `timeout` | The period of the open state (directly after failure) until the circuit breaker switches to half-open. Defaults to `60s`. |
| `trip` | A [Common Expression Language (CEL)](https://github.com/google/cel-spec) statement that is evaluated by the circuit breaker. When the statement evaluates to true, the circuit breaker trips and becomes open. Defaults to `consecutiveFailures > 5`. Other possible values are `requests` and `totalFailures` where `requests` represents the number of either successful or failed calls before the circuit opens and `totalFailures` represents the total (not necessarily consecutive) number of failed attempts before the circuit opens. Example: `requests > 5` and `totalFailures >3`.|

## Next steps
- [Learn more about default resiliency policies]({{< ref default-policies.md >}})
- Learn more about:
- [Retry policies]({{< ref retries-overview.md >}})
- [Timeout policies]({{< ref timeouts.md >}})

## Related links

Try out one of the Resiliency quickstarts:
- [Resiliency: Service-to-service]({{< ref resiliency-serviceinvo-quickstart.md >}})
- [Resiliency: State Management]({{< ref resiliency-state-quickstart.md >}})
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
type: docs
title: "Default resiliency policies"
linkTitle: "Default policies"
weight: 40
description: "Learn more about the default resiliency policies for timeouts, retries, and circuit breakers"
---

In resiliency, you can set default policies, which have a broad scope. This is done through reserved keywords that let Dapr know when to apply the policy. There are 3 default policy types:

- `DefaultRetryPolicy`
- `DefaultTimeoutPolicy`
- `DefaultCircuitBreakerPolicy`

If these policies are defined, they are used for every operation to a service, application, or component. They can also be modified to be more specific through the appending of additional keywords. The specific policies follow the following pattern, `Default%sRetryPolicy`, `Default%sTimeoutPolicy`, and `Default%sCircuitBreakerPolicy`. Where the `%s` is replaced by a target of the policy.

Below is a table of all possible default policy keywords and how they translate into a policy name.

| Keyword | Target Operation | Example Policy Name |
| -------------------------------- | ---------------------------------------------------- | ----------------------------------------------------------- |
| `App` | Service invocation. | `DefaultAppRetryPolicy` |
| `Actor` | Actor invocation. | `DefaultActorTimeoutPolicy` |
| `Component` | All component operations. | `DefaultComponentCircuitBreakerPolicy` |
| `ComponentInbound` | All inbound component operations. | `DefaultComponentInboundRetryPolicy` |
| `ComponentOutbound` | All outbound component operations. | `DefaultComponentOutboundTimeoutPolicy` |
| `StatestoreComponentOutbound` | All statestore component operations. | `DefaultStatestoreComponentOutboundCircuitBreakerPolicy` |
| `PubsubComponentOutbound` | All outbound pubusub (publish) component operations. | `DefaultPubsubComponentOutboundRetryPolicy` |
| `PubsubComponentInbound` | All inbound pubsub (subscribe) component operations. | `DefaultPubsubComponentInboundTimeoutPolicy` |
| `BindingComponentOutbound` | All outbound binding (invoke) component operations. | `DefaultBindingComponentOutboundCircuitBreakerPolicy` |
| `BindingComponentInbound` | All inbound binding (read) component operations. | `DefaultBindingComponentInboundRetryPolicy` |
| `SecretstoreComponentOutbound` | All secretstore component operations. | `DefaultSecretstoreComponentTimeoutPolicy` |
| `ConfigurationComponentOutbound` | All configuration component operations. | `DefaultConfigurationComponentOutboundCircuitBreakerPolicy` |
| `LockComponentOutbound` | All lock component operations. | `DefaultLockComponentOutboundRetryPolicy` |

## Policy hierarchy resolution

Default policies are applied if the operation being executed matches the policy type and if there is no more specific policy targeting it. For each target type (app, actor, and component), the policy with the highest priority is a Named Policy, one that targets that construct specifically.

If none exists, the policies are applied from most specific to most broad.

## How default policies and built-in retries work together

In the case of the [built-in retries]({{< ref override-default-retries.md >}}), default policies do not stop the built-in retry policies from running. Both are used together but only under specific circumstances.

For service and actor invocation, the built-in retries deal specifically with issues connecting to the remote sidecar (when needed). As these are important to the stability of the Dapr runtime, they are not disabled **unless** a named policy is specifically referenced for an operation. In some instances, there may be additional retries from both the built-in retry and the default retry policy, but this prevents an overly weak default policy from reducing the sidecar's availability/success rate.

Policy resolution hierarchy for applications, from most specific to most broad:

1. Named Policies in App Targets
2. Default App Policies / Built-In Service Retries
3. Default Policies / Built-In Service Retries

Policy resolution hierarchy for actors, from most specific to most broad:

1. Named Policies in Actor Targets
2. Default Actor Policies / Built-In Actor Retries
3. Default Policies / Built-In Actor Retries

Policy resolution hierarchy for components, from most specific to most broad:

1. Named Policies in Component Targets
2. Default Component Type + Component Direction Policies / Built-In Actor Reminder Retries (if applicable)
3. Default Component Direction Policies / Built-In Actor Reminder Retries (if applicable)
4. Default Component Policies / Built-In Actor Reminder Retries (if applicable)
5. Default Policies / Built-In Actor Reminder Retries (if applicable)

As an example, take the following solution consisting of three applications, three components and two actor types:

Applications:

- AppA
- AppB
- AppC

Components:

- Redis Pubsub: pubsub
- Redis statestore: statestore
- CosmosDB Statestore: actorstore

Actors:

- EventActor
- SummaryActor

Below is policy that uses both default and named policies as applies these to the targets.

```yaml
spec:
policies:
retries:
# Global Retry Policy
DefaultRetryPolicy:
policy: constant
duration: 1s
maxRetries: 3

# Global Retry Policy for Apps
DefaultAppRetryPolicy:
policy: constant
duration: 100ms
maxRetries: 5

# Global Retry Policy for Apps
DefaultActorRetryPolicy:
policy: exponential
maxInterval: 15s
maxRetries: 10

# Global Retry Policy for Inbound Component operations
DefaultComponentInboundRetryPolicy:
policy: constant
duration: 5s
maxRetries: 5

# Global Retry Policy for Statestores
DefaultStatestoreComponentOutboundRetryPolicy:
policy: exponential
maxInterval: 60s
maxRetries: -1

# Named policy
fastRetries:
policy: constant
duration: 10ms
maxRetries: 3

# Named policy
retryForever:
policy: exponential
maxInterval: 10s
maxRetries: -1

targets:
apps:
appA:
retry: fastRetries

appB:
retry: retryForever

actors:
EventActor:
retry: retryForever

components:
actorstore:
retry: fastRetries
```
The table below is a break down of which policies are applied when attempting to call the various targets in this solution.
| Target | Policy Used |
| ------------------ | ----------------------------------------------- |
| AppA | fastRetries |
| AppB | retryForever |
| AppC | DefaultAppRetryPolicy / DaprBuiltInActorRetries |
| pubsub - Publish | DefaultRetryPolicy |
| pubsub - Subscribe | DefaultComponentInboundRetryPolicy |
| statestore | DefaultStatestoreComponentOutboundRetryPolicy |
| actorstore | fastRetries |
| EventActor | retryForever |
| SummaryActor | DefaultActorRetryPolicy |
## Next steps
[Learn how to override default retry policies.]({{< ref override-default-retries.md >}})
## Related links
Try out one of the Resiliency quickstarts:
- [Resiliency: Service-to-service]({{< ref resiliency-serviceinvo-quickstart.md >}})
- [Resiliency: State Management]({{< ref resiliency-state-quickstart.md >}})
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
type: docs
title: "Retry and back-off resiliency policies"
linkTitle: "Retries"
weight: 20
description: "Configure resiliency policies for retries and back-offs"
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
type: docs
title: "Override default retry resiliency policies"
linkTitle: "Override default retries"
weight: 20
description: "Learn how to override the default retry resiliency policies for specific APIs"
---

Dapr provides [default retries]({{< ref default-policies.md >}}) for any unsuccessful request, such as failures and transient errors. Within a resiliency spec, you have the option to override Dapr's default retry logic by defining policies with reserved, named keywords. For example, defining a policy with the name `DaprBuiltInServiceRetries`, overrides the default retries for failures between sidecars via service-to-service requests. Policy overrides are not applied to specific targets.

> Note: Although you can override default values with more robust retries, you cannot override with lesser values than the provided default value, or completely remove default retries. This prevents unexpected downtime.
Below is a table that describes Dapr's default retries and the policy keywords to override them:

| Capability | Override Keyword | Default Retry Behavior | Description |
| ------------------ | ------------------------- | ------------------------------ | ----------------------------------------------------------------------------------------------------------- |
| Service Invocation | DaprBuiltInServiceRetries | Per call retries are performed with a backoff interval of 1 second, up to a threshold of 3 times. | Sidecar-to-sidecar requests (a service invocation method call) that fail and result in a gRPC code `Unavailable` or `Unauthenticated` |
| Actors | DaprBuiltInActorRetries | Per call retries are performed with a backoff interval of 1 second, up to a threshold of 3 times. | Sidecar-to-sidecar requests (an actor method call) that fail and result in a gRPC code `Unavailable` or `Unauthenticated` |
| Actor Reminders | DaprBuiltInActorReminderRetries | Per call retries are performed with an exponential backoff with an initial interval of 500ms, up to a maximum of 60s for a duration of 15mins | Requests that fail to persist an actor reminder to a state store |
| Initialization Retries | DaprBuiltInInitializationRetries | Per call retries are performed 3 times with an exponential backoff, an initial interval of 500ms and for a duration of 10s | Failures when making a request to an application to retrieve a given spec. For example, failure to retrieve a subscription, component or resiliency specification |


The resiliency spec example below shows overriding the default retries for _all_ service invocation requests by using the reserved, named keyword 'DaprBuiltInServiceRetries'.

Also defined is a retry policy called 'retryForever' that is only applied to the appB target. appB uses the 'retryForever' retry policy, while all other application service invocation retry failures use the overridden 'DaprBuiltInServiceRetries' default policy.

```yaml
spec:
policies:
retries:
DaprBuiltInServiceRetries: # Overrides default retry behavior for service-to-service calls
policy: constant
duration: 5s
maxRetries: 10

retryForever: # A user defined retry policy replaces default retries. Targets rely solely on the applied policy.
policy: exponential
maxInterval: 15s
maxRetries: -1 # Retry indefinitely

targets:
apps:
appB: # app-id of the target service
retry: retryForever
```
## Related links
Try out one of the Resiliency quickstarts:
- [Resiliency: Service-to-service]({{< ref resiliency-serviceinvo-quickstart.md >}})
- [Resiliency: State Management]({{< ref resiliency-state-quickstart.md >}})
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
type: docs
title: "Retry resiliency policies"
linkTitle: "Overview"
weight: 10
description: "Configure resiliency policies for retries"
---

Requests can fail due to transient errors, like encountering network congestion, reroutes to overloaded instances, and more. Sometimes, requests can fail due to other resiliency policies set in place, like triggering a defined timeout or circuit breaker policy.

In these cases, configuring `retries` can either:
- Send the same request to a different instance, or
- Retry sending the request after the condition has cleared.

Retries and timeouts work together, with timeouts ensuring your system fails fast when needed, and retries recovering from temporary glitches.

Dapr provides [default resiliency policies]({{< ref default-policies.md >}}), which you can [overwrite with user-defined retry policies.]({{< ref override-default-retries.md >}})

{{% alert title="Important" color="warning" %}}
Each [pub/sub component]({{< ref supported-pubsub >}}) has its own built-in retry behaviors. Explicity applying a Dapr resiliency policy doesn't override these implicit retry policies. Rather, the resiliency policy augments the built-in retry, which can cause repetitive clustering of messages.
{{% /alert %}}

## Retry policy format

**Example 1**

```yaml
spec:
policies:
# Retries are named templates for retry configurations and are instantiated for life of the operation.
retries:
pubsubRetry:
policy: constant
duration: 5s
maxRetries: 10

retryForever:
policy: exponential
maxInterval: 15s
maxRetries: -1 # Retry indefinitely
```
**Example 2**
```yaml
spec:
policies:
retries:
retry5xxOnly:
policy: constant
duration: 5s
maxRetries: 3
matching:
httpStatusCodes: "429,500-599" # retry the HTTP status codes in this range. All others are not retried.
gRPCStatusCodes: "1-4,8-11,13,14" # retry gRPC status codes in these ranges and separate single codes.
```
## Spec metadata
The following retry options are configurable:
| Retry option | Description |
| ------------ | ----------- |
| `policy` | Determines the back-off and retry interval strategy. Valid values are `constant` and `exponential`.<br/>Defaults to `constant`. |
| `duration` | Determines the time interval between retries. Only applies to the `constant` policy.<br/>Valid values are of the form `200ms`, `15s`, `2m`, etc.<br/> Defaults to `5s`.|
| `maxInterval` | Determines the maximum interval between retries to which the [`exponential` back-off policy](#exponential-back-off-policy) can grow.<br/>Additional retries always occur after a duration of `maxInterval`. Defaults to `60s`. Valid values are of the form `5s`, `1m`, `1m30s`, etc |
| `maxRetries` | The maximum number of retries to attempt. <br/>`-1` denotes an unlimited number of retries, while `0` means the request will not be retried (essentially behaving as if the retry policy were not set).<br/>Defaults to `-1`. |
| `matching.httpStatusCodes` | Optional: a comma-separated string of [HTTP status codes or code ranges to retry](#retry-status-codes). Status codes not listed are not retried.<br/>Valid values: 100-599, [Reference](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status)<br/>Format: `<code>` or range `<start>-<end>`<br/>Example: "429,501-503"<br/>Default: empty string `""` or field is not set. Retries on all HTTP errors. |
| `matching.gRPCStatusCodes` | Optional: a comma-separated string of [gRPC status codes or code ranges to retry](#retry-status-codes). Status codes not listed are not retried.<br/>Valid values: 0-16, [Reference](https://grpc.io/docs/guides/status-codes/)<br/>Format: `<code>` or range `<start>-<end>`<br/>Example: "4,8,14"<br/>Default: empty string `""` or field is not set. Retries on all gRPC errors. |


## Exponential back-off policy

The exponential back-off window uses the following formula:

```
BackOffDuration = PreviousBackOffDuration * (Random value from 0.5 to 1.5) * 1.5
if BackOffDuration > maxInterval {
BackoffDuration = maxInterval
}
```

## Retry status codes

When applications span multiple services, especially on dynamic environments like Kubernetes, services can disappear for all kinds of reasons and network calls can start hanging. Status codes provide a glimpse into our operations and where they may have failed in production.

### HTTP

The following table includes some examples of HTTP status codes you may receive and whether you should or should not retry certain operations.

| HTTP Status Code | Retry Recommended? | Description |
| ------------------------- | ---------------------- | ---------------------------- |
| 404 Not Found | ❌ No | The resource doesn't exist. |
| 400 Bad Request | ❌ No | Your request is invalid. |
| 401 Unauthorized | ❌ No | Try getting new credentials. |
| 408 Request Timeout | ✅ Yes | The server timed out waiting for the request. |
| 429 Too Many Requests | ✅ Yes | (Respect the `Retry-After` header, if present). |
| 500 Internal Server Error | ✅ Yes | The server encountered an unexpected condition. |
| 502 Bad Gateway | ✅ Yes | A gateway or proxy received an invalid response. |
| 503 Service Unavailable | ✅ Yes | Service might recover. |
| 504 Gateway Timeout | ✅ Yes | Temporary network issue. |

### gRPC

The following table includes some examples of gRPC status codes you may receive and whether you should or should not retry certain operations.

| gRPC Status Code | Retry Recommended? | Description |
| ------------------------- | ----------------------- | ---------------------------- |
| Code 1 CANCELLED | ❌ No | N/A |
| Code 3 INVALID_ARGUMENT | ❌ No | N/A |
| Code 4 DEADLINE_EXCEEDED | ✅ Yes | Retry with backoff |
| Code 5 NOT_FOUND | ❌ No | N/A |
| Code 8 RESOURCE_EXHAUSTED | ✅ Yes | Retry with backoff |
| Code 14 UNAVAILABLE | ✅ Yes | Retry with backoff |

### Retry filter based on status codes

The retry filter enables granular control over retry policies by allowing users to specify HTTP and gRPC status codes or ranges for which retries should apply.

```yml
spec:
policies:
retries:
retry5xxOnly:
# ...
matching:
httpStatusCodes: "429,500-599" # retry the HTTP status codes in this range. All others are not retried.
gRPCStatusCodes: "4,8-11,13,14" # retry gRPC status codes in these ranges and separate single codes.
```

{{% alert title="Note" color="primary" %}}
Field values for status codes must follow the format specified above. An incorrectly formatted value produces an error log ("Could not read resiliency policy") and the `daprd` startup sequence will proceed.
{{% /alert %}}

## Demo

Watch a demo presented during [Diagrid's Dapr v1.15 celebration](https://www.diagrid.io/videos/dapr-1-15-deep-dive) to see how to set retry status code filters using Diagrid Conductor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Watch a demo presented during [Diagrid's Dapr v1.15 celebration](https://www.diagrid.io/videos/dapr-1-15-deep-dive) to see how to set retry status code filters using Diagrid Conductor
Watch a demo presented during [Diagrid's Dapr v1.15 celebration](https://www.diagrid.io/videos/dapr-1-15-deep-dive) to see how to set retry status code filters.


<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/NTnwoDhHIcQ?si=8k1IhRazjyrIJE3P&amp;start=4565" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

## Next steps

- [Learn how to override default retry policies for specific APIs.]({[< ref override-default-retries.md >]})
- [Learn how to target your retry policies from the resiliency spec.]({{< ref targets.md >}})
- Learn more about:
- [Timeout policies]({{< ref timeouts.md >}})
- [Circuit breaker policies]({{< ref circuit-breakers.md >}})

## Related links

Try out one of the Resiliency quickstarts:
- [Resiliency: Service-to-service]({{< ref resiliency-serviceinvo-quickstart.md >}})
- [Resiliency: State Management]({{< ref resiliency-state-quickstart.md >}})
50 changes: 50 additions & 0 deletions daprdocs/content/en/operations/resiliency/policies/timeouts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
type: docs
title: "Timeout resiliency policies"
linkTitle: "Timeouts"
weight: 10
description: "Configure resiliency policies for timeouts"
---

Network calls can fail for many reasons, causing your application to wait indefinitely for responses. By setting a timeout duration, you can cut off those unresponsive services, freeing up resources to handle new requests.

Timeouts are optional policies that can be used to early-terminate long-running operations. Set a realistic timeout duration that reflects actual response times in production. If you've exceeded a timeout duration:

- The operation in progress is terminated (if possible).
- An error is returned.

## Timeout policy format

```yaml
spec:
policies:
# Timeouts are simple named durations.
timeouts:
timeoutName: timeout1
general: 5s
important: 60s
largeResponse: 10s
```
### Spec metadata
| Field | Details | Example |
| timeoutName | Name of the timeout policy | `timeout1` |
| general | Time duration for timeouts marked as "general". Uses Go's [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) format. No set maximum value. | `15s`, `2m`, `1h30m` |
| important | Time duration for timeouts marked as "important". Uses Go's [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) format. No set maximum value. | `15s`, `2m`, `1h30m` |
| largeResponse | Time duration for timeouts awaiting a large response. Uses Go's [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) format. No set maximum value. | `15s`, `2m`, `1h30m` |

> If you don't specify a timeout value, the policy does not enforce a time and defaults to whatever you set up per the request client.

## Next steps

- [Learn more about default resiliency policies]({{< ref default-policies.md >}})
- Learn more about:
- [Retry policies]({{< ref retries-overview.md >}})
- [Circuit breaker policies]({{< ref circuit-breakers.md >}})

## Related links

Try out one of the Resiliency quickstarts:
- [Resiliency: Service-to-service]({{< ref resiliency-serviceinvo-quickstart.md >}})
- [Resiliency: State Management]({{< ref resiliency-state-quickstart.md >}})
58 changes: 39 additions & 19 deletions daprdocs/content/en/operations/resiliency/resiliency-overview.md
Original file line number Diff line number Diff line change
@@ -6,25 +6,32 @@ weight: 100
description: "Configure Dapr retries, timeouts, and circuit breakers"
---

Dapr provides a capability for defining and applying fault tolerance resiliency policies via a [resiliency spec]({{< ref "resiliency-overview.md#complete-example-policy" >}}). Resiliency specs are saved in the same location as components specs and are applied when the Dapr sidecar starts. The sidecar determines how to apply resiliency policies to your Dapr API calls. In self-hosted mode, the resiliency spec must be named `resiliency.yaml`. In Kubernetes Dapr finds the named resiliency specs used by your application. Within the resiliency spec, you can define policies for popular resiliency patterns, such as:

- [Timeouts]({{< ref "policies.md#timeouts" >}})
- [Retries/back-offs]({{< ref "policies.md#retries" >}})
- [Circuit breakers]({{< ref "policies.md#circuit-breakers" >}})

Policies can then be applied to [targets]({{< ref "targets.md" >}}), which include:

- [Apps]({{< ref "targets.md#apps" >}}) via service invocation
- [Components]({{< ref "targets.md#components" >}})
- [Actors]({{< ref "targets.md#actors" >}})

Additionally, resiliency policies can be [scoped to specific apps]({{< ref "component-scopes.md#application-access-to-components-with-scopes" >}}).

## Demo video
Dapr provides the capability for defining and applying fault tolerance resiliency policies via a [resiliency spec]({{< ref "resiliency-overview.md#complete-example-policy" >}}). Resiliency specs are saved in the same location as components specs and are applied when the Dapr sidecar starts. The sidecar determines how to apply resiliency policies to your Dapr API calls.
- **In self-hosted mode:** The resiliency spec must be named `resiliency.yaml`.
- **In Kubernetes:** Dapr finds the named resiliency specs used by your application.

## Policies

You can configure Dapr resiliency policies with the following parts:
- Metadata defining where the policy applies (like namespace and scope)
- Policies specifying the resiliency name and behaviors, like:
- [Timeouts]({{< ref timeouts.md >}})
- [Retries]({{< ref retries-overview.md >}})
- [Circuit breakers]({{< ref circuit-breakers.md >}})
- Targets determining which interactions these policies act on, including:
- [Apps]({{< ref "targets.md#apps" >}}) via service invocation
- [Components]({{< ref "targets.md#components" >}})
- [Actors]({{< ref "targets.md#actors" >}})

Once defined, you can apply this configuration to your local Dapr components directory, or to your Kubernetes cluster using:

```bash
kubectl apply -f <resiliency-spec-name>.yaml
```

Learn more about [how to write resilient microservices with Dapr](https://youtu.be/uC-4Q5KFq98?si=JSUlCtcUNZLBM9rW).
Additionally, you can scope resiliency policies [to specific apps]({{< ref "component-scopes.md#application-access-to-components-with-scopes" >}}).

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/uC-4Q5KFq98?si=JSUlCtcUNZLBM9rW" title="YouTube video player" style="padding-bottom:25px;" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
> See [known limitations](#limitations).
## Resiliency policy structure

@@ -166,19 +173,32 @@ spec:
circuitBreaker: pubsubCB
```
## Related links
## Limitations
- **Service invocation via gRPC:** Currently, resiliency policies are not supported for service invocation via gRPC.
## Demos
Watch this video for how to use [resiliency](https://www.youtube.com/watch?t=184&v=7D6HOU3Ms6g&feature=youtu.be):
<div class="embed-responsive embed-responsive-16by9">
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/7D6HOU3Ms6g?start=184" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div>
Learn more about [how to write resilient microservices with Dapr](https://youtu.be/uC-4Q5KFq98?si=JSUlCtcUNZLBM9rW).
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/uC-4Q5KFq98?si=JSUlCtcUNZLBM9rW" title="YouTube video player" style="padding-bottom:25px;" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
## Next steps
Learn more about resiliency policies and targets:
- [Policies]({{< ref "policies.md" >}})
- Policies
- [Timeouts]({{< ref "timeouts.md" >}})
- [Retries]({{< ref "retries-overview.md" >}})
- [Circuit breakers]({{< ref circuit-breakers.md >}})
- [Targets]({{< ref "targets.md" >}})
## Related links
Try out one of the Resiliency quickstarts:
- [Resiliency: Service-to-service]({{< ref resiliency-serviceinvo-quickstart.md >}})
- [Resiliency: State Management]({{< ref resiliency-state-quickstart.md >}})
Binary file modified daprdocs/static/images/resiliency_inbound.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified daprdocs/static/images/resiliency_outbound.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified daprdocs/static/images/resiliency_pubsub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified daprdocs/static/images/resiliency_svc_invocation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.