Skip to content

Commit

Permalink
eviction concepts and how-to section
Browse files Browse the repository at this point in the history
  • Loading branch information
Arvind Thirumurugan committed Jan 10, 2025
1 parent 3479ddb commit 45cf143
Show file tree
Hide file tree
Showing 3 changed files with 245 additions and 1 deletion.
47 changes: 47 additions & 0 deletions docs/concepts/EvictionAndDisruptionBudget/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Eviction & Placement Disruption Budget

This document explains the concept of `Eviction` and `Placement Disruption Budget` in the context of the fleet.

## Overview

`Eviction` pertains to the act of removing resources from a target cluster propagated by a resource placement object from the hub cluster.

The `Placement Disruption Budget` object protects against voluntary disruption, and in the case of the fleet, the only allowed voluntary disruption as of now is eviction.

## Eviction

An eviction object is used to remove resources from a member cluster once the resources have already been propagated from the hub cluster.

As mentioned, the creation of an eviction object indicates that resources need to be removed from a member cluster, and once the resources are successfully removed or if eviction cannot be executed, the eviction object won't be reconciled again.

To successfully evict resources from a cluster, the user needs to specify:

- The name of the `Placement` object which propagated resources to the target cluster
- The name of the target cluster from which we need to evict resources.

When specifying the `ClusterResourcePlacement` object in the eviction's spec, the user needs to consider the following cases:

- For `PickFixed` CRP, eviction is not allowed because if users wanted to remove resources from a cluster, they could choose to remove the cluster name from the `ClusterResourcePlacement` spec or pick a different cluster.
- For `PickAll` & `PickN` CRPs, eviction is allowed because the users cannot deterministically pick or unpick a cluster based on the placement strategy; it's up to the scheduler.

> **Note:** After an eviction is executed, there is no guarantee that the cluster won't be picked again by the scheduler to propagate resources for a `ClusterResourcePlacement`.
> The user needs to specify a taint on the cluster to prevent the scheduler from picking the cluster again.
## ClusterResourcePlacementDisruptionBudget

The `ClusterResourcePlacementDisruptionBudget` is used to protect resources propagated by a `ClusterResourcePlacement` to a target cluster from voluntary disruption, i.e., `eviction`.

> **Note:** When specifying a `ClusterResourcePlacementDisruptionBudget`, the name should be the same as the `ClusterResourcePlacement` that it's trying to protect.
Users are allowed to specify one of two fields in the `ClusterResourcePlacementDisruptionBudget` spec since they are mutually exclusive:

- MaxUnavailable - specifies the maximum number of clusters in which a placement can be unavailable due to voluntary disruptions.
- MinAvailable - specifies the minimum number of clusters in which placements are available despite voluntary disruptions.

> **Note:** For both MaxUnavailable and MinAvailable, involuntary disruptions are not subject to the disruption budget but will still count against it.
When specifying a disruption budget for a particular `ClusterResourcePlacement`, the user needs to consider the following cases:

- For `PickFixed` CRP, whether a `ClusterResourcePlacementDisruptionBudget` is specified or not, if an eviction is carried out, the user will receive an invalid eviction error message in the eviction status.
- For `PickAll` CRP, if a `ClusterResourcePlacementDisruptionBudget` is specified and the `MaxUnavailable` field is set, the user will receive a misconfigured placement disruption budget error message in the eviction status.
- For `PickN` CRP, if a `ClusterResourcePlacementDisruptionBudget` is specified, the user can either set `MaxUnavailable` or `MinAvailable` field since the fields are mutually exclusive.
8 changes: 7 additions & 1 deletion docs/howtos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,10 @@ domains:

This how-to guide explains the specifics of the Fleet `ResourceOverride` API, including its
resource selectors, policy, and more. `ResourceOverride` is a Fleet API that allows you to
modify or override specific attributes across namespaced resources.
modify or override specific attributes across namespaced resources.

* [Using the Fleet `ClusterResourcePlacementEviction` and `ClusterResourcePlacementDisruptionBudget` APIs](eviction-and-disruption-budget.md)

This how-to guide explains the specifics of the Fleet `ClusterResourcePlacementEviction` and
`ClusterResourcePlacementDisruptionBudget` APIs, including how to evict resources from a
cluster and protect resources from voluntary disruption.
191 changes: 191 additions & 0 deletions docs/howtos/eviction-placement-disruption-budget.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
# How-to Guide: To evict resources from member clusters using ClusterResourcePlacementEviction and protect resources on member clusters from voluntary disruption using ClusterResourcePlacementDisruptionBudget

This how-to guide discusses how to create ClusterResourcePlacementEviction objects and ClusterResourcePlacementDisruptionBudget objects to evict resources from member clusters and protect resources on member clusters from voluntary disruption, respectively.

## Evicting Resources from Member Clusters using ClusterResourcePlacementEviction

The ClusterResourcePlacementEviction object is used to remove resources from a member cluster once the resources have already been propagated from the hub cluster.

To successfully evict resources from a cluster, the user needs to specify:
- The name of the ClusterResourcePlacement object which propagated resources to the target cluster
- The name of the target cluster from which we need to evict resources.

In this example, we will create a ClusterResourcePlacement object with PickAll placement policy to propagate resources to a MemberCluster, add a taint to the member cluster
and then create a ClusterResourcePlacementEviction object to evict resources from the MemberCluster.

We will first create a namespace that we will propagate to the member cluster,

```
kubectl create ns test-ns
```

Then we will apply a `ClusterResourcePlacement` with the following spec:

```yaml
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: test-ns
policy:
placementType: PickAll
```
The CRP status after applying should look something like this:
```yaml
status:
conditions:
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: found all cluster needed as specified by the scheduling policy, found
1 cluster(s)
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: All 1 cluster(s) start rolling out the latest resource
observedGeneration: 2
reason: RolloutStarted
status: "True"
type: ClusterResourcePlacementRolloutStarted
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: No override rules are configured for the selected resources
observedGeneration: 2
reason: NoOverrideSpecified
status: "True"
type: ClusterResourcePlacementOverridden
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: Works(s) are succcesfully created or updated in 1 target cluster(s)'
namespaces
observedGeneration: 2
reason: WorkSynchronized
status: "True"
type: ClusterResourcePlacementWorkSynchronized
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: The selected resources are successfully applied to 1 cluster(s)
observedGeneration: 2
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: The selected resources in 1 cluster(s) are available now
observedGeneration: 2
reason: ResourceAvailable
status: "True"
type: ClusterResourcePlacementAvailable
observedResourceIndex: "0"
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: 'Successfully scheduled resources for placement in "kind-cluster-1"
(affinity score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 2
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: Detected the new changes on the resources and started the rollout process
observedGeneration: 2
reason: RolloutStarted
status: "True"
type: RolloutStarted
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: No override rules are configured for the selected resources
observedGeneration: 2
reason: NoOverrideSpecified
status: "True"
type: Overridden
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: All of the works are synchronized to the latest
observedGeneration: 2
reason: AllWorkSynced
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: All corresponding work objects are applied
observedGeneration: 2
reason: AllWorkHaveBeenApplied
status: "True"
type: Applied
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: All corresponding work objects are available
observedGeneration: 2
reason: AllWorkAreAvailable
status: "True"
type: Available
selectedResources:
- kind: Namespace
name: test-ns
version: v1
```
let's now add a taint to the member cluster to ensure this cluster is not picked again one we evict resources from it.
Modify the cluster object to add a taint:
```yaml
spec:
heartbeatPeriodSeconds: 60
identity:
kind: ServiceAccount
name: fleet-member-agent-cluster-1
namespace: fleet-system
taints:
- effect: NoSchedule
key: test-key
value: test-value
```
Now we will create a `ClusterResourcePlacementEviction` object to evict resources from the member cluster:

```yaml
apiVersion: placement.kubernetes-fleet.io/v1alpha1
kind: ClusterResourcePlacementEviction
metadata:
name: test-eviction
spec:
placementName: crp-1
clusterName: kind-cluster-1
```

the eviction status let's us know if the eviction was successful:

```yaml
status:
conditions:
- lastTransitionTime: "2025-01-10T01:29:27Z"
message: Eviction is valid
observedGeneration: 1
reason: ClusterResourcePlacementEvictionValid
status: "True"
type: Valid
- lastTransitionTime: "2025-01-10T01:29:27Z"
message: Eviction is allowed, no ClusterResourcePlacementDisruptionBudget specified
observedGeneration: 1
reason: ClusterResourcePlacementEvictionExecuted
status: "True"
type: Executed
```

since the eviction is successful, the resources should be removed from the cluster let's take a look at the CRP object's status to confirm:

```yaml
status:
conditions:
- lastTransitionTime: "2025-01-10T00:58:31Z"
message: found all cluster needed as specified by the scheduling policy, found
0 cluster(s)
observedGeneration: 2
reason: SchedulingPolicyFulfilled
status: "True"
type: ClusterResourcePlacementScheduled
observedResourceIndex: "0"
selectedResources:
- kind: Namespace
name: test-ns
version: v1
```

The status shows that the resources have been removed from the cluster and the only reason the scheduler doesn't re-pick the cluster is because of the taint we added.

0 comments on commit 45cf143

Please sign in to comment.