-
Notifications
You must be signed in to change notification settings - Fork 888
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
proposal: introducing a scheduling strategy to customize cluster prop…
…agation priorities Signed-off-by: chaosi-zju <[email protected]>
- Loading branch information
1 parent
32c2ef7
commit 6921114
Showing
1 changed file
with
283 additions
and
0 deletions.
There are no files selected for viewing
283 changes: 283 additions & 0 deletions
283
docs/proposals/scheduling/aggregate-by-priority/aggregate-by-priority.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,283 @@ | ||
--- | ||
title: Introducing a scheduling strategy to customize cluster propagation priorities | ||
authors: | ||
- "@chaosi-zju" | ||
reviewers: | ||
- "@TBD" | ||
approvers: | ||
- "@RainbowMango" | ||
|
||
creation-date: 2024-09-07 | ||
--- | ||
|
||
# Introducing a scheduling strategy to customize cluster propagation priorities | ||
|
||
## Summary | ||
|
||
This proposal introduces a new user scenario for scheduling, where users have preferences for clusters with | ||
varying priorities and want replicas assigned to their preferred clusters. | ||
|
||
To support this scenario, we propose a new scheduling strategy that allows users to customize the priority of clusters, | ||
ensuring that replicas are preferentially allocated to higher-priority clusters. | ||
|
||
## Motivation | ||
|
||
Karmada currently supports three `Divided` type scheduling strategies, | ||
these are `Dynamic Weighted`, `Static Weight` and `Aggregated`. | ||
However, the current strategies are still not satisfied in certain user scenarios. | ||
|
||
<!-- 例如,假设用户有多个集群,用户希望副本优先分配到他偏好的集群,只有当他偏好的集群资源不足时,再分配到其他的集群 --> | ||
|
||
For example, the user has multiple clusters, and he wants to use some preferred clusters first. | ||
Only if these preferred clusters are not sufficiently allocated, the remaining clusters are used. | ||
|
||
From the perspective of a deployment, it should be like this: | ||
* When first propagation, prioritize allocation to preferred clusters as much as possible. | ||
* When replicas scaling up, prioritize replicas expansion in preferred clusters (existing replicas remain unchanged). | ||
* When replicas scaling down, prioritize replicas reduction in non-preferred clusters. | ||
|
||
### Goals | ||
|
||
<!-- 本文旨在提供一种调度策略,让用户可以自定义集群的优先级,让副本优先分配到优先级高的集群 --> | ||
|
||
* Aims to providing a scheduling strategy that allows users to customize the priority of clusters, | ||
ensuring that replicas are preferentially allocated to higher-priority clusters. | ||
|
||
### Non-Goals | ||
|
||
<!-- | ||
用户对相同优先级集群怎么分配没有特殊诉求,合理即可, | ||
本文对相同优先级集群采取聚合的策略分配(将副本划分到尽可能少的集群),不过多发散到其他复杂场景 | ||
--> | ||
|
||
* Users have no specific requests for allocating clusters with the same priority, as long as it is reasonable. | ||
Here uses an `Aggregated` strategy to treat that case, dividing replicas into as few clusters as possible, avoiding more complex scenarios. | ||
|
||
## Proposal | ||
|
||
### User Stories (Optional) | ||
|
||
#### Story 1 | ||
|
||
As a cluster administrator, I manage two clusters, one is a cheap private cloud cluster and the other is an expensive public cloud cluster. | ||
|
||
I hope that the user's replicas will give priority to the specified cheaper cluster, whether it is first propagation or scaling up. | ||
Only when the cheaper cluster has insufficient capacity, the expensive cluster will be used, so that I can save more costs. | ||
|
||
### Notes/Constraints/Caveats (Optional) | ||
|
||
### Risks and Mitigations | ||
|
||
## Design Details | ||
|
||
<!-- | ||
本段落提出一种名为 PriorityAggregated 的新的调度策略,在此调度策略下: | ||
* 用户可以在 Policy 中对这个集群指定优先级 | ||
* 对于不同优先级的集群,副本优先分配到高优先级的集群,只有在高优先级集群资源不足后,副本才会分配给低优先级集群 | ||
* 对于相同优先级的集群,将副本划分到尽可能少的集群 | ||
--> | ||
|
||
This section introduces a new scheduling strategy called `PriorityAggregated`: | ||
|
||
* Users can specify the priority of clusters in the policy. | ||
* Replicas are allocated to higher-priority clusters first, | ||
only when resources are insufficient in those clusters will replicas be assigned to lower-priority clusters. | ||
* For clusters with the same priority, replicas are divided into as few clusters as possible. | ||
|
||
#### API changes | ||
|
||
add a new `ReplicaDivisionPreference` type named `PriorityAggregated` | ||
|
||
```go | ||
// ReplicaDivisionPreference describes options of how replicas can be scheduled. | ||
type ReplicaDivisionPreference string | ||
|
||
const ( | ||
ReplicaDivisionPreferenceAggregated ReplicaDivisionPreference = "Aggregated" | ||
ReplicaDivisionPreferenceWeighted ReplicaDivisionPreference = "Weighted" | ||
+ // ReplicaDivisionPreferencePriorityAggregated assigns replicas to higher-priority clusters first, | ||
+ // then to lower-priority clusters if resources are insufficient. | ||
+ ReplicaDivisionPreferencePriorityAggregated ReplicaDivisionPreference = "PriorityAggregated" | ||
) | ||
``` | ||
|
||
add a new field `priorityPreference` to `ReplicaSchedulingStrategy` of the PropoagationPolicy, as following: | ||
|
||
```go | ||
// ReplicaSchedulingStrategy represents the assignment strategy of replicas. | ||
type ReplicaSchedulingStrategy struct { | ||
// ReplicaDivisionPreference determines how the replicas is divided | ||
// when ReplicaSchedulingType is "Divided". Valid options are Aggregated and Weighted. | ||
// "Aggregated" divides replicas into clusters as few as possible, | ||
// while respecting clusters' resource availabilities during the division. | ||
// "Weighted" divides replicas by weight according to WeightPreference. | ||
+ // "PriorityAggregated" assigns replicas to higher-priority clusters first, | ||
+ // then to lower-priority clusters if resources are insufficient. | ||
- // +kubebuilder:validation:Enum=Aggregated;Weighted | ||
+ // +kubebuilder:validation:Enum=Aggregated;Weighted;PriorityAggregated | ||
// +optional | ||
ReplicaDivisionPreference ReplicaDivisionPreference `json:"replicaDivisionPreference,omitempty"` | ||
|
||
// WeightPreference describes weight for each cluster or for each group of cluster | ||
// If ReplicaDivisionPreference is set to "Weighted", and WeightPreference is not set, scheduler will weight all clusters the same. | ||
// +optional | ||
WeightPreference *ClusterPreferences `json:"weightPreference,omitempty"` | ||
|
||
+ // PriorityPreference describes allocation priority for each cluster or for each group of cluster | ||
+ // If ReplicaDivisionPreference is set to "PriorityAggregated", and PriorityPreference is not set, scheduler will prioritize all clusters equally. | ||
+ // +optional | ||
+ PriorityPreference *PriorityPreference `json:"priorityPreference,omitempty"` | ||
} | ||
``` | ||
|
||
the newly added type `PriorityPreference` is defined as following: | ||
|
||
```go | ||
// PriorityPreference describes allocation priority for each cluster or for each group of cluster | ||
type PriorityPreference struct { | ||
// PreferredPriorityList defines the clusters' preferred allocation priority. | ||
// +optional | ||
PreferredPriorityList []ClusterPreferredPriority `json:"preferredPriorityList"` | ||
} | ||
|
||
// ClusterPreferredPriority defines the clusters' preferred allocation priority. | ||
type ClusterPreferredPriority struct { | ||
// TargetCluster describes the filter to select clusters. | ||
// +required | ||
TargetCluster ClusterAffinity `json:"targetCluster"` | ||
|
||
// Priority expressing the priority of the cluster(s) specified by 'TargetCluster'. | ||
// larger numbers mean higher priority. | ||
// +kubebuilder:validation:Minimum=1 | ||
// +required | ||
Priority int64 `json:"priority"` | ||
} | ||
``` | ||
|
||
#### Usage Example | ||
|
||
Supposing the user has four clusters (member1/member2/member3/member4), | ||
and he wants replicas to be aggregated first in member1/member2 clusters, then he can define the policy as following: | ||
|
||
```yaml | ||
apiVersion: policy.karmada.io/v1alpha1 | ||
kind: PropagationPolicy | ||
metadata: | ||
name: default-pp | ||
spec: | ||
#... | ||
placement: | ||
clusterAffinity: | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
- member3 | ||
- member4 | ||
replicaScheduling: | ||
replicaSchedulingType: Divided | ||
replicaDivisionPreference: PriorityAggregated | ||
priorityPreference: | ||
preferredPriorityList: | ||
- targetCluster: | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
priority: 2 | ||
- targetCluster: | ||
clusterNames: | ||
- member3 | ||
- member4 | ||
priority: 1 | ||
``` | ||
As you see, the user specifies the cluster aggregation priority through `spec.placement.replicaScheduling.priorityPreference` | ||
field, the member1/member2 cluster has a higher priority and will be aggregated first. | ||
|
||
#### Use case and its behavior | ||
|
||
Basing on above usage example, assuming the max available replicas for each cluster is `10`. | ||
|
||
Then under the `PriorityAggregated` scheduling strategy, the actual allocation results for different desired replicas are as follows: | ||
|
||
| cluster \ sum replicas | 8 | 16 | 28 | 36 | | ||
|-------------------------|---|----|----|----| | ||
| member1 (high proirity) | 8 | 8 | 10 | 10 | | ||
| member2 (high proirity) | 0 | 8 | 10 | 10 | | ||
| member3 (low proirity) | 0 | 0 | 8 | 8 | | ||
| member4 (low proirity) | 0 | 0 | 0 | 8 | | ||
|
||
#### Controller logic changes | ||
|
||
<!-- | ||
原 Aggregated 调度策略的实现存在一些待考虑的细节,例如副本分配到两个集群,先把第一个集群装满,剩下的分配到第二个集群, | ||
这种更符合 Aggregated 的语义;但实际是依据最大可用副本数均摊到两个集群 | ||
|
||
如果我们考虑将原 Aggregated 的实现修改为上述第一种结果,那么新增的 PriorityAggregated 策略实现起来就很简单, | ||
直接复用 Aggregated 的代码,只需在进入 Aggregated 分配函数前,对集群按照指定的优先级排个序, | ||
PriorityAggregated 相当于一种特殊的、定制化的 Aggregated 策略。 | ||
|
||
--> | ||
|
||
## Alternatives | ||
|
||
<!-- | ||
将集群优先级当作比调度策略更高维度的字段,集群优先级可以搭配静态权重、动态权重、聚合等调度策略使用: | ||
* 对于相同优先级的集群,遵循原调度策略分发,逻辑不变 | ||
* 对于不同优先级的集群,不管调度策略如何,都优先分配给高优先级的集群 | ||
--> | ||
|
||
Cluster priority is treated as a higher-dimensional field than the scheduling strategy and can be combined with static and dynamic weights, as well as aggregated: | ||
* Same-priority clusters follow the original scheduling strategy. | ||
* Higher-priority clusters are allocated first, regardless of the strategy. | ||
|
||
### implicitly declare cluster priority | ||
|
||
<!-- 利用多调度组隐式表达集群优先级,如下例中member1/member2优先级更高,当选到第二个调度组时,依然优先分配给member1/member2集群 --> | ||
|
||
```yaml | ||
clusterAffinities: | ||
- affinityName: primary-clusters | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
- affinityName: backup-clusters | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
- member3 | ||
- member4 | ||
replicaScheduling: | ||
replicaSchedulingType: Divided | ||
replicaDivisionPreference: Aggregated | ||
``` | ||
|
||
### explicitly declare cluster priority | ||
|
||
<!-- 新增字段显式声明集群的优先级 --> | ||
|
||
```yaml | ||
clusterAffinity: | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
- member3 | ||
- member4 | ||
clusterPriority: | ||
preferredPriorityList: | ||
- targetCluster: | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
priority: 2 | ||
- targetCluster: | ||
clusterNames: | ||
- member3 | ||
- member4 | ||
priority: 1 | ||
replicaScheduling: | ||
replicaSchedulingType: Divided | ||
replicaDivisionPreference: Aggregated | ||
``` | ||
|
||
Discussion results: this kind of `cluster priority` dimension has a significant impact on the current scheduling strategy, | ||
which is not being considered now. |