Consolidation does not happen even when there is cheaper combination of instances available #1962

codeeong · 2025-02-05T07:55:20Z

Description

Observed Behavior:
For context, we wanted to leave the cost effectiveness decision to Karpenter, thus we gave a variety of instance types: c5a,c6a,m6a,m5a,c7a,r6a,r5a,r4, in large/xlarge/2xlarge, thinking the different combinations of cpu:memory options would allow Karpenter to make the best decisions for optimal usage to cost on our behalf.

However, based on the nodes karpenter chose for us, our memory utilization is pretty good at 90%, however cpu usage is very low (around 50%).
For instance, we have many c5a.xlarge instances (in the same AZ) that use less than 50% CPU. So two of these could be consolidated into a cheaper m6a.xlarge that has double the amount of memory and the same CPU. But the event on the node says
Normal Unconsolidatable 4m36s (x47 over 15h) karpenter Can't replace with a cheaper node

Instance types with CPU usage looks like this:

This ends up being more expensive than our original pre-provisioned nodepool of nodes (which was around 60-65% utilization for both cpu and memory).
To alleviate this issue, we have removed certain instance types from our list of instance types in our NodePool configuration. However we are curious to know if this is the expected behavior because if so then it seems users still have to understand which specific subset of instances fits the resource needs of our clusters, only then can we make use of Karpenter to minimise costs.

Expected Behavior:
We expect that we should see multi-node consolidation, defined as

Multi Node Consolidation - Try to delete two or more nodes in parallel, possibly launching a single replacement whose price is lower than that of all nodes being removed

For instance, we would expect to see 2 c5a.xlarge instances get consolidated into 1 m6a.xlarge as the cpu and memory would fit into that instance and cost lost.

Reproduction Steps (Please include YAML):
nodepool config:

  "object": {
      "apiVersion": "karpenter.sh/v1",
      "kind": "NodePool",
      "metadata": {
        "annotations": {
          "karpenter.sh/nodepool-hash": "10589712261218411145",
          "karpenter.sh/nodepool-hash-version": "v3"
        },
        "creationTimestamp": null,
        "deletionGracePeriodSeconds": null,
        "deletionTimestamp": null,
        "finalizers": null,
        "generateName": null,
        "generation": null,
        "labels": null,
        "managedFields": null,
        "name": "node-pool-1",
        "namespace": null,
        "ownerReferences": null,
        "resourceVersion": null,
        "selfLink": null,
        "uid": null
      },
      "spec": {
        "disruption": {
          "budgets": [
            {
              "duration": null,
              "nodes": "5%",
              "reasons": null,
              "schedule": null
            },
          ],
          "consolidateAfter": "30s",
          "consolidationPolicy": "WhenEmptyOrUnderutilized"
        },
        "limits": {
          "cpu": "140",
          "memory": "1000Gi"
        },
        "template": {
          "metadata": {
            "annotations": null,
            "labels": null
          },
          "spec": {
            "expireAfter": "Never",
            "nodeClassRef": {
              "group": "karpenter.k8s.aws",
              "kind": "EC2NodeClass",
              "name": "node-pool-1"
            },
            "requirements": [
              {
                "key": "node.kubernetes.io/instance-type",
                "minValues": null,
                "operator": "In",
                "values": [
                  "c5a.xlarge",
                  "c5a.2xlarge",
                  "c6a.xlarge",
                  "c6a.2xlarge",
                  "c7a.xlarge",
                  "c7a.2xlarge",
                  "m5a.xlarge",
                  "m5a.2xlarge",
                  "m6a.xlarge",
                  "m6a.2xlarge",
                  "r4.xlarge",
                  "r4.2xlarge",
                  "r5a.xlarge",
                  "r5a.2xlarge",
                  "r6a.xlarge",
                  "r6a.2xlarge"
                ]
              },
              {
                "key": "karpenter.sh/capacity-type",
                "minValues": null,
                "operator": "NotIn",
                "values": [
                  "spot"
                ]
              },
              {
                "key": "eks.amazonaws.com/capacityType",
                "minValues": null,
                "operator": "In",
                "values": [
                  "ON_DEMAND"
                ]
              },
              {
                "key": "topology.kubernetes.io/zone",
                "minValues": null,
                "operator": "In",
                "values": [
                  "ap-southeast-1a",
                  "ap-southeast-1b",
                  "ap-southeast-1c"
                ]
              }
            ],
            "startupTaints": null,
            "taints": null,
            "terminationGracePeriod": null
          }
        },
        "weight": null
      }
    },
    "timeouts": [],
    "wait": [],
    "wait_for": null
  }
}

Versions:

Chart Version: v1.1.1
Kubernetes Version (kubectl version): v1.30

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

jonathan-innis · 2025-02-06T17:51:48Z

I'm imagining that this has to do with multi-node consolidation not being able to find the combination of two instances that could be consolidated. In general, it's tough for us to try all the combinations, though we could probably improve the overall heuristic that we use to consider which nodes we could combine

jonathan-innis · 2025-02-06T18:31:14Z

This is effectively an issue about getting a better heuristic on multi-node consolidation selection before we actually perform the scheduling simulation
cc: @rschalo

jonathan-innis · 2025-02-06T18:32:25Z

/triage accepted

jonathan-innis · 2025-02-13T19:48:11Z

/priority important-longterm

codeeong added the kind/bug Categorizes issue or PR as related to a bug. label Feb 5, 2025

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 5, 2025

jonathan-innis added consolidation performance Issues relating to performance (memory usage, cpu usage, timing) labels Feb 6, 2025

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 6, 2025

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidation does not happen even when there is cheaper combination of instances available #1962

Consolidation does not happen even when there is cheaper combination of instances available #1962

codeeong commented Feb 5, 2025 •

edited

Loading

jonathan-innis commented Feb 6, 2025

jonathan-innis commented Feb 6, 2025

jonathan-innis commented Feb 6, 2025

jonathan-innis commented Feb 13, 2025

Consolidation does not happen even when there is cheaper combination of instances available #1962

Consolidation does not happen even when there is cheaper combination of instances available #1962

Comments

codeeong commented Feb 5, 2025 • edited Loading

Description

jonathan-innis commented Feb 6, 2025

jonathan-innis commented Feb 6, 2025

jonathan-innis commented Feb 6, 2025

jonathan-innis commented Feb 13, 2025

codeeong commented Feb 5, 2025 •

edited

Loading