Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set Auto on nodeProvisioningProfile for existing cluster due to aadProfile #80

Closed
nfsouzaj opened this issue Jan 2, 2024 · 18 comments
Assignees
Labels
area/nap Issues or PRs related to Node Auto Provisioning (NAP) kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@nfsouzaj
Copy link

nfsouzaj commented Jan 2, 2024

Version

Karpenter Version: v0.0.0

Kubernetes Version: v1.27.3

Expected Behavior

I am trying to shift an existing cluster, setting the node provisioning mode to Auto. by running the following command:
az aks update --resource-group myGroup --name myAKS --node-provisioning-mode Auto

Actual Behavior

After running the command:
az aks update --resource-group myGroup --name myAKS --node-provisioning-mode Auto
I get the following error:

Argument '--node-provisioning-mode' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
The behavior of this command has been altered by the following extension: aks-preview
(OperationNotAllowed) .properties.nodeProvisioningProfile.mode cannot be Auto if .properties.aadProfile is specified.
Code: OperationNotAllowed
Message: .properties.nodeProvisioningProfile.mode cannot be Auto if .properties.aadProfile is specified.

Steps to Reproduce the Problem

  1. Created a cluster using terraform that has one nodepool with Manual Provision
  2. Register the NodeAutoProvisioningPreview feature flag following this
  3. Run the CLI cmd in the expected behavior to update the provision mode to Auto
  4. Cli complains that the operation is not allowed

Resource Specs and Logs

AKS cluster 1.27.3
CNI: Azure with Overlay
Cilium DataPlane enabled

I am trying to create a cluster using terraform and afterwards change the autoprovision so that I can create nodepools with Karpenter however the error in the section Actual Behavior happens.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@nfsouzaj nfsouzaj added the kind/bug Categorizes issue or PR as related to a bug. label Jan 2, 2024
@Bryce-Soghigian
Copy link
Collaborator

Bryce-Soghigian commented Jan 3, 2024

Currently AKS does not support enabling karpenter through nap on existing clusters, so the command needs to be run on cluster create. Karpenter also will not be supporting service principal and only allow for MSI auth.

@Bryce-Soghigian Bryce-Soghigian self-assigned this Jan 3, 2024
@Bryce-Soghigian
Copy link
Collaborator

Related to #59

@nfsouzaj
Copy link
Author

nfsouzaj commented Jan 3, 2024

@Bryce-Soghigian - thanks for your answer.

We are using managed identity, not SP. With regards to running the command while the cluster is being created, I am using terraform. How can I do that?
Regards,

@philwelz
Copy link

philwelz commented Jan 3, 2024

@nfsouzaj you could use the AzAPi Terraform Provider but as per docs the latest API does not support NAP. You have to wait until a new API version gets released. Until that NAP can be only enabled via azure-cli on new clusters.

@nfsouzaj
Copy link
Author

nfsouzaj commented Jan 3, 2024

@philwelz thanks for the answer. Is there an ETA?

@nfsouzaj
Copy link
Author

nfsouzaj commented Jan 8, 2024

@philwelz @Bryce-Soghigian appreciate a feedback on ETA at the earliest convenience.

@Bryce-Soghigian
Copy link
Collaborator

@nfsouzaj the AKS Team doesn't own the AzAPI terraform SDK :( so we will need to contact someone else for an answer there. Let me do some asking around internally and get back to you!

@tallaxes tallaxes added area/nap Issues or PRs related to Node Auto Provisioning (NAP) needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 9, 2024
@Bryce-Soghigian
Copy link
Collaborator

Bryce-Soghigian commented Jan 9, 2024

@nfsouzaj talked to the azapi terraform provider team, and seems the docs that Phil shared are out of date. This is what was said.


The docs are not update to date, but the azapi provider does support the latest 2023-10-02-preview, here's an example of how to use the node provisioning profile:

resource "azapi_resource" "rg" {
  type     = "Microsoft.Resources/resourceGroups@2020-06-01"
  name     = "henglu0109aks"
  location = "westeurope"

}
resource "azapi_resource" "cluster" {
  type      = "Microsoft.ContainerService/managedClusters@2023-10-02-preview"
  parent_id = azapi_resource.rg.id
  name      = "henglu0109aks"
  location  = "westeurope"
  identity {
    type = "SystemAssigned"
    identity_ids = []
  }
  
  body = jsonencode({
    properties = {
      agentPoolProfiles = [
        {
          count  = 1
          mode   = "System"
          name   = "default"
          vmSize = "Standard_DS2_v2"
        },
      ]
      nodeProvisioningProfile = {
        mode = "Auto"
      }
      dnsPrefix = "henglu0109aks"
    }
  })
}

@Bryce-Soghigian
Copy link
Collaborator

Going to close this issue as enabling karpenter on existing clusters is an unsupported scenario for now.

@tallaxes
Copy link
Collaborator

tallaxes commented Feb 1, 2024

Reopening, as enabling NAP on existing clusters is supported, and we determined the issue here is with rejecting addProfile altogether, whether only certain deprecated configurations should be rejected.

@tallaxes tallaxes reopened this Feb 1, 2024
@tallaxes tallaxes added kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 1, 2024
@nfsouzaj
Copy link
Author

nfsouzaj commented Mar 5, 2024

@tallaxes hi! Do you have a feedback on this? Also, would you know when will the TF provider have karpenter in it?

@tallaxes
Copy link
Collaborator

tallaxes commented Mar 5, 2024

@nfsouzaj The fix for addProfile is included in AKS Release 2024-02-07, which should already be rolled out everywhere.

You can already enable NAP via Terraform as shown above. I don't expect any more explicit support for Karpenter/NAP in TF (at least until/unless we decide to expose some AKS API configuration surface), but you should already be able to deploy NodePools and AKSNodeClasses resources via TF's Kubernetes or kubectl provider.

@tallaxes tallaxes closed this as completed Mar 5, 2024
@nfsouzaj
Copy link
Author

nfsouzaj commented Mar 5, 2024

@tallaxes sorry, I am not following how to enable the node auto provisioning on terraform. I also don't see in the portal any scale mode other than manual and autoscale. Its been really frustrating trying to use this feature.

@tallaxes
Copy link
Collaborator

tallaxes commented Mar 6, 2024

@nfsouzaj Sorry to hear that. It is a preview feature, so ways of enabling it are still limited. (In particular, there is no Portal UX yet.)

I am not that familiar with Terraform AzAPI provider, so won't be able to help much, but the key part in the example above is setting nodeProvisioningProfile = { mode = "Auto" } , which should do the job. What I am not sure about is how it interacts with preview feature enablement ...

So your best bet / easiest way right now is likely to use Azure CLI, following the directions under Node Autoprovisioning (preview). Using CLI to enable NAP on an existing cluster - including one created with Terraform, as you were originally trying - should work as well, now that we fixed the aadProfile bug. If that path still does not work for you - please share the results, and we should be able to help.

@aslafy-z
Copy link

aslafy-z commented Mar 6, 2024

The Terraform implementation was not accepted. It appears there are ongoing debates about whether preview features should be incorporated into their provider: hashicorp/terraform-provider-azurerm#25084 (comment)

@nfsouzaj
Copy link
Author

nfsouzaj commented Mar 6, 2024

Thanks for the feedback @aslafy-z . Disappointing news...

@nfsouzaj
Copy link
Author

@aslafy-z hi, is there news on when this is coming to tf?

@aslafy-z
Copy link

@nfsouzaj No news... I think the best thing to do is to ask your Microsoft TAM to push the feature. Hopefully it will come one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nap Issues or PRs related to Node Auto Provisioning (NAP) kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants