Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter Support #65

Open
pragmaticivan opened this issue Apr 1, 2024 · 6 comments
Open

Karpenter Support #65

pragmaticivan opened this issue Apr 1, 2024 · 6 comments

Comments

@pragmaticivan
Copy link

Hi, will this project work with Karpenter + EKS managed pools in the same cluster?

@teevans
Copy link

teevans commented Apr 1, 2024

Hey there! Yes! This should work just fine.

@pragmaticivan
Copy link
Author

This is done by cordoning all nodes in the cluster (other than our new g1-small node), and then reducing the node pool sizes to 0.

Mind sharing how would that work?

I know it would be able to reduce the pool to 0 for the Karpenter controllers but I don't see how that would be able to delete the Karpenter Managed Nodes since they are not managed by EKS.

@pragmaticivan
Copy link
Author

Here's what I'm roughly expecting:

Cluster ABC in EKS

1 Node Pool for Karpenter (1 node)
1 Node Pool for CriticalADdons (coreDNS) (1 node)
Multiple EC2 images managed by Karpenter (No EKS pools here)

From 8pm-5am I want:

  1. ClusterTurnDown will create a new tiny pool for its controller.
  2. ClusterTurnDown will cordon the 2 EKS pools available
  3. ClusterTurnDown should pause Karpenter controller
  4. ClusterTurnDown should TErminate all nodes managed by Karpenter
  5. ClusterTurnDown should resize the 2 EKs pools to 0
  6. Profit

@michaelmdresser
Copy link
Contributor

I don't believe that Cluster Turndown has been tested with EKS+Karpenter. The general turndown implementation can be found here:

I suspect that Cluster Turndown would end up fighting Karpenter a bit because it currently does not interact with Karpenter, meaning it won't be able to pause the Karpenter controller as you suggest.

@vinicius-loureiro-lacerda

One thing that I think it is important to mention when using Karpenter + Cluster-Turndown is that the Cluster-Turndown apparently needs to be scheduled on EKS Node Groups and not on Karpenter Node Pools, otherwise it will throw the errors below:

2024-04-09T17:58:47Z INF Determined to be running in a cluster. Using in-cluster K8s config.
2024-04-09T17:58:47Z DBG Recommendation service IsAvailable() GET finished endpoint=http://kubecost-release-cost-analyzer.kubecost:9090/model/savings/requestSizingV2 status=500
2024-04-09T17:58:47Z INF Kubescaler run started
2024-04-09T17:58:47Z INF Starting main loop
2024-04-09T17:58:47Z INF Found ProviderID starting with "aws" and eks nodegroup, using EKS Provider
2024-04-09T17:58:47Z WRN Failed to load valid access key from secret. Err=Failed to locate service account file: /var/keys/service-key.json
2024-04-09T17:58:47Z DBG No workloads have autoscaling enabled. Sleeping for a while.
2024-04-09T17:58:47Z INF Could not find NodeGroup '' in Cluster '<CLUSTER_NAME>' error="InvalidParameter: 1 validation error(s) found.\n- minimum field size of 1, DescribeNodegroupInput.NodegroupName.\n"
2024-04-09T17:58:47Z ERR Failed to initialize cluster provider. Components like Turndown, Continuous Cluster Sizing, and 1-click Cluster Sizing will not initialize. error="initializing cluster data: Failed to locate Clusters which have node groups containing the current instance: <INSTANCE_ID>"
2024-04-09T17:58:47Z DBG Cluster Info service IsAvailable() GET finished endpoint=http://kubecost-release-cost-analyzer.kubecost:9090/model/savings/abandonedWorkloads status=200

You guys can correct me if I'm wrong, but I couldn't manage to make it run on Karpenter-created nodes because Karpenter doesn't create EKS Node Groups.

I can also send the configuration we're using if needed, but at least for now the behavior here was:
-> Running on EKS Node Group nodes: ✅
-> Running on Karpenter nodes: ❌

@Smana
Copy link

Smana commented Jul 3, 2024

@vinicius-loureiro-lacerda Hello, I'm not sure to understand. Does it mean that if the Cluster turndown runs on an EKS node group, it is able to shutdown even karpenter nodes? (e.g. all nodepools...).
If so why is this issue still opened :) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants