Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typha autoscaler's autoscaling profile to be configurable #3095

Open
consideRatio opened this issue Jan 9, 2024 · 0 comments
Open

Typha autoscaler's autoscaling profile to be configurable #3095

consideRatio opened this issue Jan 9, 2024 · 0 comments

Comments

@consideRatio
Copy link

Expected Behavior

I expect that the logic to map cluster node count to typha replica's isn't hardcoded here:

// GetExpectedTyphaScale will return the number of Typhas needed for the number of nodes.
//
// Nodes Replicas
//
// 1-2 1
// 3-4 2
// <200 3
// >400 4
// >600 5
// >800 6
// >1000 7
// ...
// >2000 12
// ...
// >3600 20
func GetExpectedTyphaScale(nodes int) int {
var maxNodesPerTypha int = 200
// This gives a count of how many 200s so we need 1+ this number to get at least
// 1 typha for every 200 nodes.
typhas := (nodes / maxNodesPerTypha) + 1
// We add one more to ensure there is always 1 extra for high availability purposes.
typhas += 1
// We have a couple special cases for small clusters. We want to ensure that we run one fewer
// Typha instances than there are nodes, so that there is room for rescheduling. We also want
// to ensure we have at least two, where possible, so that we have redundancy.
if nodes <= 2 {
// For one and two node clusters, we only need a single typha.
typhas = 1
} else if nodes <= 4 {
// For three and four node clusters, we can run an additional typha.
typhas = 2
} else if typhas < 3 {
// For clusters with more than 4 nodes, make sure we have a minimum of three for redundancy.
typhas = 3
}
return typhas
}

Practically, it could be made configurable with a nodesToReplicas ladder like done in GKE's managed calico deployment via a configmap.

# ...
data:
  ladder: |-
    {
      "coresToReplicas": [],
      "nodesToReplicas":
      [
        [1, 1],
        [2, 2],
        [100, 3],
        [250, 4],
        [500, 5],
        [1000, 6],
        [1500, 7],
        [2000, 8]
      ]
    }

Current Behavior

The autoscaling profile is fixed and can't be influenced.

Possible Solution

To provide nodesToReplicas configuration of the typha autoscaler nested under Installation resource somewhere, where the default value of this configuration mimics the current implementation.

Context

I'd like this feature to avoid forcing additional nodes to a small cluster just to house these pods that can't schedule next to each other, as that incurr cloud cost but also for hogs available compute in clouds and wastes energy for the world.

In a small k8s clusters having for example just four nodes, but where three of them are reserved for other things and only one is available to run the calico-typha pod, will fail to schedule 2/3 calico-typha pods (# node(s) didn't have free ports for the requested pod ports). When this happen a cluster-autoscaler could end up creating additional nodes even though the admin determines just one or two calico-typha pod would have sufficed.

Your Environment

  • AKS 1.28.3 using tigera operator tigera/operator:v1.28.13
  • We have "core nodes" and "user nodes", where we typically just have one or possibly two core nodes where workloads like calico-typha should run, but often a few additional "user nodes" where those are forbidden to run via taints.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant