-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster fails to reconcile with etcdserver: request is too large
#4649
Comments
@danilo404 I'm investigating this but haven't reproduced it so far. I started with CAPZ v1.13.0 and created an AKS cluster with similar network config to what you posted, and 10 node pools with 1000 nginx pods. But I'm not seeing any % kubectl get amcp -o yaml
apiVersion: v1
items:
- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedControlPlane
metadata:
name: aks-etcd-1234
spec:
networkPlugin: azure
networkPolicy: azure
version: v1.28.3
virtualNetwork:
cidrBlock: 10.0.0.0/16
name: aks-etcd-1234-vnet
resourceGroup: aks-etcd-1234
subnet:
cidrBlock: 10.0.0.0/16
name: aks-etcd-1234
% kubectl get VirtualNetworksSubnet -o yaml
apiVersion: v1
items:
- apiVersion: network.azure.com/v1api20201101
kind: VirtualNetworksSubnet
spec:
addressPrefix: 10.0.0.0/16
addressPrefixes:
- 10.0.0.0/16
azureName: aks-etcd-1234
owner:
name: aks-etcd-1234-vnet
status:
addressPrefix: 10.0.0.0/16
conditions:
id: /subscriptions/00000000-0000-0000-0000-00000000000/resourceGroups/aks-etcd-1234/providers/Microsoft.Network/virtualNetworks/aks-etcd-1234-vnet/subnets/aks-etcd-1234
name: aks-etcd-1234
privateEndpointNetworkPolicies: Disabled
privateLinkServiceNetworkPolicies: Enabled
provisioningState: Succeeded
type: Microsoft.Network/virtualNetworks/subnets
kind: List Sorry, I must be missing something. Any ideas on what other configuration might be relevant? I assume I wouldn't need to scale up as far as you have just to see the way |
Hi @mboersma, thank you so much for looking into this. Your assumption is correct, the
|
@danilo404 I'm sorry I haven't added an update here. I tried again but haven't been able to light up the |
Hello @mboersma, sorry for the delay. If I understand the flow correctly, at some point CAPZ creates a |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Fixed in ASO Azure/azure-service-operator#4428 |
/kind bug
What steps did you take and what happened:
A cluster was created with the following
AzureManagedControlPlane
network configurationThe cluster has many
AzureManagedMachinePool
objects, each withmaxPods
between 80 and 150.Reconciliation for the cluster worked fine for a long time, until started failing with:
What did you expect to happen:
The cluster to continue to be reconciled successfully
Anything else you would like to add:
While investigating the issue we noticed the
virtualnetworkssubnets.network.azure.com
CR is populated with thousands of entries for theipConfigurations
field, it seems like the entries are IP reservations for Pods, the number matches the number of Pods per node, repeated for every VM in the VMSS for example:Checking the approximate size of the payload this object would have at run time confirms it's above the AKS etcd request size limit of 2mb:
We suspect the issue is caused by the large number of
ipConfigurations
that come from Azure and are added to this field, which result from the combination of the large number nodes multiplied by the number ofmaxPods
per node. When the cluster scales up above a certain number, this object can no longer be persisted due its size.Environment:
kubectl version
): N/A/etc/os-release
): N/AThe text was updated successfully, but these errors were encountered: