-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there support to deploy Karpenter on existing clusters? #59
Comments
@alexo1088 you can run the self hosted version on existing clusters, but just know the surface area we tested on is limited to the cluster configuration we have the makefile create commands. If you are using different CNI configuration, or have some addons we haven't fully explored, you may see some adverse behavior. So the short answer to the question For Node Autoprovisioning preview we currently do not support enabling it on existing clusters, but plan on enabling it in the near future. Feel free to share more here on your cluster configuration, and I can advise you further. |
Is there an ETA of when there will be some testing/configurations for running Karpenter on an existing cluster? |
We have validated that enabling NAP on the existing clusters works - as long as they satisfy certain constraints (e.g. Azure Overlay + Cilium CNI, some more on this on the NAP page). So, This leaves manual (self-hosted) deployment of Karpenter on an existing cluster - again, a compatible one. While already possible, this currently requires building from source. What's missing for a good experience is a published image and Helm chart. We are working on this, expect this to be available within a week; will update this issue when done. |
@tallaxes i tested enabling NAP on an existing cluster and it worked like charm. I also wrote a small article about it. |
Thanks for all the help on this, everyone! @philwelz Your article is great, but after following it, and ensuring that NAP is configured on my cluster, I can't seem to get workloads to trigger scaling through NAP/Karpenter. None of the resources created seem to generate useful logs, so i'm wondering if someone can help me understand what's missing? I'm assuming an actual provisioner does not need to be created, despite that CRD seemingly being installed after deploying NAP. Here's the output of commands showing NAP has been successfully installed:
The inflate workload is set to only run on the non-system node pool but it remains unschedulable due to no new nodes being created:
No nodeclaims or events showing anything related to karpenter. Anyone have any ideas on what may be happening? Thank you in advance! |
@alexo1088 looking on the backend for the cluster scheduling those inflate pods, it looks like karpenter is crashing due to a failure to resolve the vnet dynamically. Is your cluster using custom vnet? Currently karpenter is not compatible with CustomVNET. |
thanks @Bryce-Soghigian ! I'm actually not very familiar with custom vnets in the context of AKS at all. Quick research seems to indicate that that is connected to the |
This is the error I see for your cluster. Can you check under the MC Resource group if you have a vnet there? It could be a bug if you didn't configure custom vnet it may be retrieving the wrong vnetName as well. In the code we retrieve the cluster vnet by the mc resource group: https://github.com/Azure/karpenter-provider-azure/blob/main/pkg/providers/instance/azure_client.go#L117C1-L118C1 |
Thanks @Bryce-Soghigian So, the RG I deploy NAP into is not the MC RG, it's the RG that I used to create the cluster, which is also where the VNET is deployed into as well. There is no VNET associated with that MC RG. Is this expected behavior? |
i used this command and worked for me:
With AZ CLI version:
|
Thanks, @Vinaum8. @Bryce-Soghigian Here's what I ran:
Here are some excerpts from the output after the command:
Basically, all resources are being created in the MC RG. This RG has no Vnet associated with it, and I don't seem to be able to control which RG the Karpenter resources get deployed into, since it didn't seem to honor the variable I defined when running the first command. Can someone help me understand if this is a bug that needs to be addressed? |
@alexo1088 let me create a separate issue to discuss custom vnet and other networking scenarios |
Issue for custom VNET support: #199 |
Created a new issue cutting more specific requirements for a first pass, see: https://github.com/Azure/karpenter-provider-azure/issues/231. Will have PR and design for this within the week |
@alexo1088 Did you get it to work? I have the same symptoms (the helm releases are here, along with the NodeClass, NodePool etc.) but no node is created when deploying workloads. The VNET is in xxx-aks-networking-rg When looking my cluster properties, I can see:
I don't know what is going wrong 😞 |
Custom vnets dont work as explained in that issue I shared above.. But it will i fixed it here #238. I just need to cut a new version and test things in nap then roll out a new release! |
Oh ok my bad, I misread the PR in the first place and thought having VnetSubnetId populated would be enough. |
@Bryce-Soghigian Is the fix already live? I'm still facing the same behavior. |
@CCOLLOT the fix is rolling out. Its reached a couple of regions. What region are you in? I tested support with this command. Note that to use custom vnet with nap you need to assign an rbac as well along with the feature work to get karpenter to understand the custom vnet.
|
I'm running in Europe West. |
@CCOLLOT Looks like west europe is one of the last regions in the release. I can let you know when it reaches there |
Is an existing page relevant?
https://github.com/Azure/karpenter/tree/main/examples
What karpenter features are relevant?
Provisioner
How should the docs be improved?
This page seems to indicate that there is support for existing clusters, but it points to an example that doesn't seem to exist
Community Note
The text was updated successfully, but these errors were encountered: