Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Reconciling VirtualNetworksSubnet fails with "Request entity too large: limit is 3145728" #4428

Closed
danilo404 opened this issue Nov 7, 2024 · 10 comments · Fixed by #4448
Assignees
Labels
bug 🪲 Something isn't working capz Required for CAPZ ASO adoption high-priority Issues we intend to prioritize (security, outage, blocking bug)
Milestone

Comments

@danilo404
Copy link

danilo404 commented Nov 7, 2024

Describe the bug

The bug manifests on our cluster created with the following networking parameters:

az aks show --subscription ExampleSubscription -n example-cluster-name -g example-cluster-name-rg -o table --query networkProfile

NetworkPlugin    NetworkPolicy    NetworkDataplane    ServiceCidr    DnsServiceIp    OutboundType    LoadBalancerSku    PodLinkLocalAccess
---------------  ---------------  ------------------  -------------  --------------  --------------  -----------------  --------------------
azure            azure            azure               10.100.0.0/16  10.100.0.10     loadBalancer    standard           IMDS

And it has 20 Agent Pools, with the following sizes:

 az aks show --subscription ExampleSubscription -n example-cluster-name -g example-cluster-name-rg -o table --query "agentPoolProfiles[].{Count: count, maxCount: maxCount, maxPods: maxPods}"
Count    MaxCount    MaxPods
-------  ----------  ---------
0        3           20
5        7           150
0        3           80
2        5           110
0        50          100
36       60          100
27       100         100
12       20          100
11       33          110
1        4           110
3        8           80
0        0           100
5        10          100
2        7           100
4        30          100
5        30          100
0        3           20
15       30          100
3        3           20
2        7           80

CAPZ created a VirtualNetworksSubnet ASO CR for that cluster with the following configuration:

az network vnet subnet show --ids "example/subnet/id" -o table --query "{addressPrefix: addressPrefix, privateEndpointNetworkPolicies: privateEndpointNetworkPolicies, privateLinkServiceNetworkPolicies: privateLinkServiceNetworkPolicies}"

AddressPrefix    PrivateEndpointNetworkPolicies    PrivateLinkServiceNetworkPolicies
---------------  --------------------------------  -----------------------------------
10.0.0.0/16      Disabled                          Enabled

When the AgentPools reach somewhere close to the "counts" above, the VirtualNetworksSubnet object in azure grows in size to around 5.6mb, if fills up with thousands of entries in the ipConfigurations field:

az network vnet subnet show --ids /subscriptions/.../subnets/example-cluster-subnet > example-cluster-subnet.json
ls -lh example-cluster-subnet.json
-rw-r--r--@ 1 danilo.uipath  staff   5.6M Nov  4 12:37 example-cluster-subnet.json
cat example-cluster-subnet.json| jq '.ipConfigurations | length'
14006
cat example-cluster-subnet.json| jq '.ipConfigurations[0].id | length'
305
cat example-cluster-subnet.json| jq '.ipConfigurations[0].resourceGroup | length'
60

ASO then tries to persist the ipConfigurations into the VirtualNetworksSubnet CR's status and this causes the api server to return:

E1107 10:21:00.621890       1 generic_reconciler.go:143] "msg"="Failed to commit object to etcd" "error"="updating example-ns/example-cluster-name-vnet-example-cluster-name-subnet resource: Request entity too large: limit is 3145728" "logger"="controllers.VirtualNetworksSubnetController" "name"="example-cluster-name-example-cluster-name-subnet" "namespace"="example-ns"

Azure Service Operator Version: v2.8.0

Expected behavior

The VirtualNetworksSubnet to continue reconciling successfuly for any scalable size of my Agent Pools.

To Reproduce

Create a VirtualNetworksSubnet CR for an Azure Cloud Subnet with a large number of ipConfigurations and wait for the controller to attempt to sync it.

Additional context

This issue relates to another issue in the CAPZ project kubernetes-sigs/cluster-api-provider-azure#4649

@danilo404 danilo404 added the bug 🪲 Something isn't working label Nov 7, 2024
@matthchr matthchr added the capz Required for CAPZ ASO adoption label Nov 11, 2024
@matthchr
Copy link
Member

Can you share what the spec for the subnet looks like, as managed by CAPZ?

@matthchr
Copy link
Member

I think the issue we've got here is the fact that there are 14k entries for the ipConfigurations field (which Azure allows), but at some point you cross the Kubernetes boundary for max resource size.

There is also a max resource size boundary for Azure I believe, but I think it's 4mb not 1.5mb which AFAIK is the default on Kubernetes.

@matthchr matthchr self-assigned this Nov 11, 2024
@matthchr matthchr added the high-priority Issues we intend to prioritize (security, outage, blocking bug) label Nov 11, 2024
@danilo404
Copy link
Author

Can you share what the spec for the subnet looks like, as managed by CAPZ?

AMCP resource:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedControlPlane
spec:
  virtualNetwork:
    cidrBlock: 10.0.0.0/16
    name: example-cluster-vnet
    resourceGroup: example-cluster-rg
    subnet:
      cidrBlock: 10.0.0.0/16
      name: example-cluster-subnet
      serviceEndpoints:
        - locations:
            - '*'
          service: Microsoft.Sql
        - locations:
            - '*'
          service: Microsoft.KeyVault
        - locations:
            - '*'
          service: Microsoft.Storage
        - locations:
            - '*'
          service: Microsoft.AzureCosmosDB
        - locations:
            - '*'
          service: Microsoft.ServiceBus
        - locations:
            - '*'
          service: Microsoft.EventHub

And the Subnet it creates:

apiVersion: network.azure.com/v1api20201101
kind: VirtualNetworksSubnet
spec:
  addressPrefix: 10.0.0.0/16
  addressPrefixes:
  - 10.0.0.0/16
  azureName: example-cluster-subnet
  owner:
    name: example-cluster-vnet
  serviceEndpoints:
  - locations:
    - '*'
    service: Microsoft.Sql
  - locations:
    - '*'
    service: Microsoft.KeyVault
  - locations:
    - '*'
    service: Microsoft.Storage
  - locations:
    - '*'
    service: Microsoft.AzureCosmosDB
  - locations:
    - '*'
    service: Microsoft.ServiceBus
  - locations:
    - '*'
    service: Microsoft.EventHub

@matthchr
Copy link
Member

matthchr commented Nov 12, 2024

I looked at this some more and I think this comes down to a mismatch between the allowed max size of an Azure resource (which is I think somewhere in the 4mb range) and the allowed max size of a Kubernetes resource, which is ~1.5mb.

Since we fundamentally cannot fit this much data into etcd, there's not really much we can do here other than elide the .status.ipConfigurations after some maximum length. The only thing that makes me feel any better about that is the fact that it's probably not practically possible to really use a list of 14000 ipConfiguration ARM IDs for anything anyway.

@nojnhuh - is CAPZ using .status.ipConfigurations for anything right now?

@matthchr matthchr added this to the v2.12.0 milestone Nov 13, 2024
@nojnhuh
Copy link
Member

nojnhuh commented Nov 13, 2024

@nojnhuh - is CAPZ using .status.ipConfigurations for anything right now?

It is not, so however you handle that should work for CAPZ.

@danilo404
Copy link
Author

danilo404 commented Nov 13, 2024

Hey @matthchr, thanks so much for looking into this. Irt the etcd limit, the problem seems to manifest in different ways depending on the size of the object in Azure. Note that in the original ticket I opened in CAPZ, the error was different and it came from etcd:

E0315 17:13:54.206966       1 controller.go:329] "msg"="Reconciler error" "error"="updating mynamespace/examplecluster-vnet-examplecluster-subnet resource status: etcdserver: request is too large" "logger"="controllers" "name"="examplecluster-vnet-examplecluster-subnet" "namespace"="examplenamespace" "reconcileID"="..."

In that case, also note that the Subnet was not as large, when the error was observed, the subnet size was around 2.9mb.

Now the subnet object in Azure reached around 5.6mb and the error seems to come from the Kubernetes API server itself, this limit is hardcoded in more than on place, e.g. here.

I think in this case the object did not reach etcd.

@matthchr
Copy link
Member

Thanks @danilo404 - I suppose a more precise phrasing of the problem is not so much etcd but: Azure allows larger resources than Kubernetes. I think once the etcd limit is crossed it won't work in k8s, though I didn't know about the hardcoded apsierver limit that ends up giving a different error if the request gets large enough.

@matthchr
Copy link
Member

In terms of plan to fix this, it didn't make 2.11.0 (which has already shipped). I think we can try getting a fix merged before most of us go on holiday, which could enable consumption of the fix via the experimental release, but official release will probably need to wait until next year. There's also the added wrinkle of CAPZ using a slightly older version of ASO which may delay uptake in vanilla CAPZ as well.

Unfortunately I don't really see a workaround for this problem other than "keep the cluster small" in the meantime, though possibly this issue isn't actually breaking things severely if CAPZ isn't trying to update the subnet?

Can you share what the impact is to you @danilo404, and if you have any workaround to it currently?

@danilo404
Copy link
Author

Thanks for the update @matthchr. We don't have workarounds for this case, but the impact for now is not blocking. What happens now is that the CAPZ object AzureManagedControlPlane reconcile loop tries to sync the Subnetwork's status (even without changes to the spec) and the CAPI/CAPZ Cluster stays in a Failed state in Kubernetes, but the cluster itself in Azure is healthy. In any case the experimental release would be really useful, because the AMCP in 'failed' state causes other headaches, like the Flux orchestration that is unable to progress, and related alerts' silencing etc.

@matthchr
Copy link
Member

Ok the experimental build should have a fix for this now @danilo404.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working capz Required for CAPZ ASO adoption high-priority Issues we intend to prioritize (security, outage, blocking bug)
Projects
Status: Ready for Release
Development

Successfully merging a pull request may close this issue.

4 participants