Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster fails to reconcile with etcdserver: request is too large #4649

Closed
danilo404 opened this issue Mar 15, 2024 · 6 comments
Closed

Cluster fails to reconcile with etcdserver: request is too large #4649

danilo404 opened this issue Mar 15, 2024 · 6 comments
Assignees
Labels
area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Comments

@danilo404
Copy link

danilo404 commented Mar 15, 2024

/kind bug

What steps did you take and what happened:
A cluster was created with the following AzureManagedControlPlane network configuration

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedControlPlane
spec:
  networkPlugin: azure
  networkPolicy: azure
  virtualNetwork:
    cidrBlock: 10.0.0.0/16
    name: examplecluster
    resourceGroup: examplecluster-rg
    subnet:
      cidrBlock: 10.0.0.0/16

The cluster has many AzureManagedMachinePool objects, each with maxPods between 80 and 150.

Reconciliation for the cluster worked fine for a long time, until started failing with:

E0315 17:13:54.206966       1 controller.go:329] "msg"="Reconciler error" "error"="updating mynamespace/examplecluster-vnet-examplecluster-subnet resource status: etcdserver: request is too large" "logger"="controllers" "name"="examplecluster-vnet-examplecluster-subnet" "namespace"="examplenamespace" "reconcileID"="..."

What did you expect to happen:
The cluster to continue to be reconciled successfully

Anything else you would like to add:

While investigating the issue we noticed the virtualnetworkssubnets.network.azure.com CR is populated with thousands of entries for the ipConfigurations field, it seems like the entries are IP reservations for Pods, the number matches the number of Pods per node, repeated for every VM in the VMSS for example:

apiVersion: network.azure.com/v1api20201101
kind: VirtualNetworksSubnet
...
status:
  conditions:
  ipConfigurations:
  - id: /subscriptions/<subscription>/resourceGroups/<resource group>/PROVIDERS/MICROSOFT.COMPUTE/VIRTUALMACHINESCALESETS/.../ipConfigurations/IPCONFIG1
  - id: /subscriptions/<subscription>/resourceGroups/<resource group>/PROVIDERS/MICROSOFT.COMPUTE/VIRTUALMACHINESCALESETS/.../ipConfigurations/IPCONFIG2

Checking the approximate size of the payload this object would have at run time confirms it's above the AKS etcd request size limit of 2mb:

az network vnet subnet show --ids <my subnet resource id> > subnet.json
ls -hl subnet.json
-rw-r--r--  1 danilo  staff   2.9M Mar 14 17:21 subnet.json

We suspect the issue is caused by the large number of ipConfigurations that come from Azure and are added to this field, which result from the combination of the large number nodes multiplied by the number of maxPods per node. When the cluster scales up above a certain number, this object can no longer be persisted due its size.

Environment:

  • cluster-api-provider-azure version: v1.13.0
  • Kubernetes version: (use kubectl version): N/A
  • OS (e.g. from /etc/os-release): N/A
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 15, 2024
@dtzar dtzar added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type labels Mar 19, 2024
@dtzar dtzar assigned dtzar and mboersma and unassigned dtzar Mar 25, 2024
@mboersma
Copy link
Contributor

mboersma commented Mar 25, 2024

@danilo404 I'm investigating this but haven't reproduced it so far.

I started with CAPZ v1.13.0 and created an AKS cluster with similar network config to what you posted, and 10 node pools with 1000 nginx pods. But I'm not seeing any ipConfigurations on the VirtualNetworksSubnet:

% kubectl get amcp -o yaml
apiVersion: v1
items:
- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
  kind: AzureManagedControlPlane
  metadata:
    name: aks-etcd-1234
  spec:
    networkPlugin: azure
    networkPolicy: azure
    version: v1.28.3
    virtualNetwork:
      cidrBlock: 10.0.0.0/16
      name: aks-etcd-1234-vnet
      resourceGroup: aks-etcd-1234
      subnet:
        cidrBlock: 10.0.0.0/16
        name: aks-etcd-1234
% kubectl get VirtualNetworksSubnet -o yaml                              
apiVersion: v1
items:
- apiVersion: network.azure.com/v1api20201101
  kind: VirtualNetworksSubnet
  spec:
    addressPrefix: 10.0.0.0/16
    addressPrefixes:
    - 10.0.0.0/16
    azureName: aks-etcd-1234
    owner:
      name: aks-etcd-1234-vnet
  status:
    addressPrefix: 10.0.0.0/16
    conditions:
    id: /subscriptions/00000000-0000-0000-0000-00000000000/resourceGroups/aks-etcd-1234/providers/Microsoft.Network/virtualNetworks/aks-etcd-1234-vnet/subnets/aks-etcd-1234
    name: aks-etcd-1234
    privateEndpointNetworkPolicies: Disabled
    privateLinkServiceNetworkPolicies: Enabled
    provisioningState: Succeeded
    type: Microsoft.Network/virtualNetworks/subnets
kind: List

Sorry, I must be missing something. Any ideas on what other configuration might be relevant? I assume I wouldn't need to scale up as far as you have just to see the way ipConfigurations is being used.

@danilo404
Copy link
Author

Hi @mboersma, thank you so much for looking into this. Your assumption is correct, the ipConfigurations field is always present in our case. The virtualnetworkssubnets.network.azure.com object is now smaller, and it successfully reconciled, but we can still see many items in ipConfigurations, I believe it will break again as soon as the number of nodes scale up.
Our CR looks almost exactly like yours, except for the ipConfigurations field and we also have some serviceEndpoints, here are more complete versions:

kubectl get virtualnetworkssubnets.network.azure.com example-cluster-vnet-example-cluster-subnet -o yaml
---
apiVersion: network.azure.com/v1api20201101
kind: VirtualNetworksSubnet
metadata:
  creationTimestamp: "2024-02-01T16:32:39Z"
  finalizers:
  - serviceoperator.azure.com/finalizer
  generation: 1
  labels:
    serviceoperator.azure.com/owner-group-kind: VirtualNetwork.network.azure.com
    serviceoperator.azure.com/owner-name: example-cluster-vnet
    sigs.k8s.io_cluster-api-provider-azure_owned: example-cluster-aks
  name: example-cluster-vnet-example-cluster-subnet
  ownerReferences:
  - apiVersion: network.azure.com/v1api20201101storage
    kind: VirtualNetwork
    name: example-cluster-vnet
    uid: ...
  resourceVersion: "1193221614"
  uid: ...
spec:
  addressPrefix: 10.0.0.0/16
  addressPrefixes:
  - 10.0.0.0/16
  azureName: example-cluster-subnet
  owner:
    name: example-cluster-vnet
  serviceEndpoints:
  - locations:
    - '*'
    service: Microsoft.Sql
  - locations:
    - '*'
    service: Microsoft.KeyVault
  - locations:
    - '*'
    service: Microsoft.Storage
  - locations:
    - '*'
    service: Microsoft.AzureCosmosDB
  - locations:
    - '*'
    service: Microsoft.ServiceBus
  - locations:
    - '*'
    service: Microsoft.EventHub
status:
  addressPrefix: 10.0.0.0/16
  conditions:
  - lastTransitionTime: "2024-03-19T17:39:06Z"
    observedGeneration: 1
    reason: Succeeded
    status: "True"
    type: Ready
  etag: ...
  id: ....
  ipConfigurations:
  - id: /subscriptions/<subscription id>/resourceGroups/<resource group name>/PROVIDERS/MICROSOFT.COMPUTE/VIRTUALMACHINESCALESETS/AKS-AIMODELSCPU-17351824-VMSS/VIRTUALMACHINES/0/NETWORKINTERFACES/AKS-AIMODELSCPU-17351824-VMSS/ipConfigurations/IPCONFIG1
    [... thousands of similar entries ...]
  name: example-cluster-subnet
  privateEndpointNetworkPolicies: Disabled
  privateLinkServiceNetworkPolicies: Enabled
  provisioningState: Succeeded
  serviceEndpoints:
  - locations:
    - northeurope
    provisioningState: Succeeded
    service: Microsoft.Sql
  - locations:
    - '*'
    provisioningState: Succeeded
    service: Microsoft.KeyVault
  - locations:
    - northeurope
    - westeurope
    provisioningState: Succeeded
    service: Microsoft.Storage
  - locations:
    - '*'
    provisioningState: Succeeded
    service: Microsoft.AzureCosmosDB
  - locations:
    - '*'
    provisioningState: Succeeded
    service: Microsoft.ServiceBus
  - locations:
    - '*'
    provisioningState: Succeeded
    service: Microsoft.EventHub
  type: [Microsoft.Network/virtualNetworks/subnets](http://microsoft.network/virtualNetworks/subnets)
kubectl get amcp example-cluster -o yaml
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedControlPlane
metadata:
annotations:
  clusterctl.cluster.x-k8s.io/block-move: "true"
creationTimestamp: "2023-12-06T13:58:26Z"
finalizers:
- azuremanagedcontrolplane.infrastructure.cluster.x-k8s.io
generation: 6
name: example-cluster-aks
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1beta1
  blockOwnerDeletion: true
  controller: true
  kind: Cluster
  name: example-cluster-aks
resourceVersion: "1193254100"
spec:
aadProfile:
  adminGroupObjectIDs:
  - ...
  managed: true
addonProfiles:
- config:
    logAnalyticsWorkspaceResourceID: ...
  enabled: true
  name: omsagent
apiServerAccessProfile:
  authorizedIPRanges:
  - [46 entries]
controlPlaneEndpoint:
  host: ...
  port: 443
dnsPrefix: example-cluster-aks
identity:
  type: SystemAssigned
identityRef:
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
  kind: AzureClusterIdentity
  name: example-cluster-aks-identity
kubeletUserAssignedIdentity: ...
loadBalancerSKU: Standard
location: northeurope
networkPlugin: azure
networkPolicy: azure
nodeResourceGroupName: MC_example-cluster-rg_example-cluster-aks_northeurope
oidcIssuerProfile:
  enabled: true
resourceGroupName: example-cluster-rg
sku:
  tier: Free
sshPublicKey: ....
version: v1.27.9
virtualNetwork:
  cidrBlock: 10.0.0.0/16
  name: example-cluster-vnet
  resourceGroup: example-cluster-rg
  subnet:
    cidrBlock: 10.0.0.0/16
    name: example-cluster-subnet
    serviceEndpoints:
    - locations:
      - '*'
      service: Microsoft.Sql
    - locations:
      - '*'
      service: Microsoft.KeyVault
    - locations:
      - '*'
      service: Microsoft.Storage
    - locations:
      - '*'
      service: Microsoft.AzureCosmosDB
    - locations:
      - '*'
      service: Microsoft.ServiceBus
    - locations:
      - '*'
      service: Microsoft.EventHub
status:
conditions:
- lastTransitionTime: "2024-03-19T17:43:15Z"
  status: "True"
  type: Ready
- lastTransitionTime: "2024-03-19T17:43:15Z"
  status: "True"
  type: ManagedClusterRunning
- lastTransitionTime: "2024-03-14T14:23:51Z"
  status: "True"
  type: ResourceGroupReady
- lastTransitionTime: "2024-03-19T17:39:30Z"
  status: "True"
  type: SubnetsReady
- lastTransitionTime: "2024-03-14T14:24:04Z"
  status: "True"
  type: VNetReady
initialized: true
oidcIssuerProfile:
  issuerURL: ...
ready: true
version: v1.27.9

@dtzar dtzar moved this to In Progress in CAPZ Planning Mar 28, 2024
@dtzar dtzar added this to the v1.15 milestone Apr 4, 2024
@mboersma
Copy link
Contributor

@danilo404 I'm sorry I haven't added an update here. I tried again but haven't been able to light up the ipConfigurations. Have you found a workaround or do you have any other details I might be missing around the network configuration?

@mboersma mboersma removed this from the v1.15 milestone Apr 25, 2024
@dtzar dtzar added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024
@dtzar dtzar moved this from In Progress to Blocked in CAPZ Planning May 16, 2024
@danilo404
Copy link
Author

Hello @mboersma, sorry for the delay. If I understand the flow correctly, at some point CAPZ creates a VirtualNetworksSubnet ASO object right? I took a look at the ASO source for this field and it seems like the only way this would not be filled up is if it already comes empty from the API, is that correct? In that case, are you by any chance using some mock server that doesn't fill up those fields?
In any case, while investigating this a bit further, this doesn't look like a bug, but an expected behaviour from ASO, the problem here is that it seems like not using overlay makes the ASO Subnet object quickly grow beyond the AKS default etcd payload size limit. Is my understanding correct?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2024
@github-project-automation github-project-automation bot moved this from Blocked to Done in CAPZ Planning Nov 20, 2024
@danilo404
Copy link
Author

Fixed in ASO Azure/azure-service-operator#4428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

No branches or pull requests

5 participants