Incorrect PodCIDR in installations.operator.tigera.io ipPools prevented upgrade #2916

taxilian · 2023-10-05T20:30:30Z

Recently when upgrading from 3.25 to 3.26 I encountered an error which stalled the upgrade: Could not resolve CalicoNetwork IPPool and kubeadm configuration: IPPool 10.172.0.0/16 is not within the platform's configured pod network CIDR(s) [172.21.64.0/19 2607:fa18:1000:21::10:0/108]

I was quite confused about that given that I modified my cluster ages ago to remove the 10.172.0.0/16 IPPool and I couldn't figure out where the message was coming from -- the old cIDR doesn't appear in my kubeadm configuration anymore. Eventually I figured out that it was still defined in the installation resources that I had frankly forgotten even existed in my cluster. Once I removed it things worked.

A more complete look at the log messages:

Expected Behavior

First, I would expect that it should have looked at the ippool resources defined in my cluster rather than the one in the installation, given that they conflicted; secondly, however, I've actually used ippools in my cluster which were outside of the defined podCIDR range without any significant issue, so I question whether or not that should be a full failure condition anyway.

Regardless, @fasaxc asked me to create the issue on this, saying "Hmm, that seems like a bug, the CIDR in the installation resource is meant for "start of day" configuration only. I think it's supposed to be ignored later on. Please can you file an issue in the operator repo on github?"

Current Behavior

The upgrade failed to perform giving logs like this:

{"level":"error","ts":"2023-10-03T04:24:04Z","logger":"controller_installation","msg":"Error querying installation","Request.Namespace":"","Request.Name":"calico","reason":"ResourceReadError","error":"Could not resolve CalicoNetwork IPPool and kubeadm configuration: IPPool 10.172.0.0/16 is not within the platform's configured pod network CIDR(s) [172.21.64.0/19 2607:fa18:1000:21::10:0/108]","stacktrace":"github.com/tigera/operator/pkg/controller/status.(*statusManager).SetDegraded\n\t/go/src/github.com/tigera/operator/pkg/controller/status/status.go:406\ngithub.com/tigera/operator/pkg/controller/installation.(*ReconcileInstallation).Reconcile\n\t/go/src/github.com/tigera/operator/pkg/controller/installation/core_controller.go:872\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235"}`

Possible Solution

If there are ippool resources then use those instead of the others
Given the above, keep in mind that there are valid and working use cases where the ippools are not within the kubeadm-configured ranges, so even then maybe just ignore it if it's an upgrade.

Steps to Reproduce (for bugs)

Have spec.calicoNetwork.ipPools defined in your operator.tigera.io/v1 kind: Installation resource which are not inside the kubeadm podCIDR
Try to upgrade

Context

I was able to fix it once I finally figured out what was going on, but it was very frustrating =] I actually only figured it out when I dug through the source code of the operator and the codepaths leading to that and found reference to the installation resource which I'd totally forgotten about.

Your Environment

Bare metal cluster, Ubuntu 22.04.3 LTS. Calico 3.25.? to 3.26.? latest upgrade as of Oct 2, 2023.

The text was updated successfully, but these errors were encountered:

tmjd · 2023-10-06T15:23:32Z

Thank you for the report.

I believe there has been some work started to improve IPPool handling in the operator but I'm not sure if anything has been merged or even made it into a PR yet.

I would have expected the tigerastatus resource named calico to provide a message about the IPPool in the Installation and kubeadm config conflicting instead of needing to dig into the logs or code. tigerastatus is the first line of debugging to be used when there is an issue with the operator.

Yeah the IPPool in the Installation is still used by the operator unfortunately. It is true that the IPPool is only created at startup but the operator when it was initially created did not read in the IPPool resource and we haven't gone back and switched the operator to use the IPPool resources so it still uses the IPPool in the Installation resource.

taxilian · 2023-10-09T17:23:49Z

Hmm; I do remember looking at the tigerastatus and describing it, but I don't remember what the specific message was.

TBH since I didn't remember that there was an "Installation" resource and none of the documentation for upgrading mentions it at all to remind me I probably wouldn't have realized that Installation was a specific resource type. I really wish you'd called it something like "TigeraInstallConfiguration" or something like that as it would actually make sense without the extra context which is generally not included when referencing a resource.

All that said, I've got things working so I have no dog in the fight anymore -- just wanted to pass along feedback since I know it's often not clear to developers what is or isn't intuitively obvious to users of the system =]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect PodCIDR in installations.operator.tigera.io ipPools prevented upgrade #2916

Incorrect PodCIDR in installations.operator.tigera.io ipPools prevented upgrade #2916

taxilian commented Oct 5, 2023

tmjd commented Oct 6, 2023

taxilian commented Oct 9, 2023

Incorrect PodCIDR in installations.operator.tigera.io ipPools prevented upgrade #2916

Incorrect PodCIDR in installations.operator.tigera.io ipPools prevented upgrade #2916

Comments

taxilian commented Oct 5, 2023

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

tmjd commented Oct 6, 2023

taxilian commented Oct 9, 2023