Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNI tuning plugin configuration added to Installation CRD #2708

Merged
merged 2 commits into from
Dec 7, 2023

Conversation

Tamas-Biro1
Copy link
Contributor

@Tamas-Biro1 Tamas-Biro1 commented Jun 28, 2023

Description

We would like to configure sysctl keepalive parameters using Calico CNI with Tigera Operator. This PR introduces a sysctlTuning parameter under calicoNetwork in Installation manifest (https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.CalicoNetworkSpec).
Resolves projectcalico/calico#5886.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@CLAassistant
Copy link

CLAassistant commented Jun 28, 2023

CLA assistant check
All committers have signed the CLA.

@Tamas-Biro1 Tamas-Biro1 force-pushed the tuning-plugin-configuration branch 5 times, most recently from 7ed78c9 to 64137b0 Compare June 28, 2023 18:52
@fasaxc
Copy link
Member

fasaxc commented Jun 29, 2023

@Tamas-Biro1 do you mind explaining the use case for this, what sort of settings are you wanting to tune? We feel a little uncomfortable allowing any and all sysctls to be set inside the pod because they have pretty far-reaching powers. Is there a particular subset we could limit it to (for example, the interface-specific sysctls)?

You mentioned the keepalive settings, are those per-interface? Could you just set those system-wide or do you have a particular use case for setting them inside pods only.

@Tamas-Biro1
Copy link
Contributor Author

@Tamas-Biro1 do you mind explaining the use case for this, what sort of settings are you wanting to tune? We feel a little uncomfortable allowing any and all sysctls to be set inside the pod because they have pretty far-reaching powers. Is there a particular subset we could limit it to (for example, the interface-specific sysctls)?

You mentioned the keepalive settings, are those per-interface? Could you just set those system-wide or do you have a particular use case for setting them inside pods only.

@fasaxc We would like to set defaults for keepalive (namespaced sysctl) values for containers. As we see, when containerd environment starts it does not inherit some network kernel parameters from the host instead it picks default values similarly as it described here: https://stackoverflow.com/questions/54552379/tcp-keepalive-time-in-docker-container/54564456#54564456.
For CRI-O there is an option to change default sysctl values in the underlying container orchestrator, see: https://github.com/openshift/machine-config-operator/blob/release-4.11/templates/worker/01-worker-container-runtime/_base/files/crio.yaml
but there is no such option for containerd at the moment (see: containerd/containerd#6647). The Calico CNI tuning option would be the only solution for that (https://www.cni.dev/plugins/current/meta/tuning/) and the unmanaged manifest based Calico install could change these parameters in the past. That is why I came up with this idea for Tigera Operator as well.

Here is a case study why we need this setting:
https://www.ibm.com/cloud/blog/troubleshooting-outgoing-connection-issues-with-ibm-vpc-public-and-service-gateways

@Tamas-Biro1 Tamas-Biro1 changed the title Calico tuning plugin configuration added to Installation CRD CNI tuning plugin configuration added to Installation CRD Aug 17, 2023
@Tamas-Biro1
Copy link
Contributor Author

@fasaxc @caseydavenport can you take a look at this PR?

@tmjd
Copy link
Member

tmjd commented Aug 23, 2023

I guess another option instead of this would be to use a mutating web hook to modify all (non-hostNetworked) pods with a securityContext that sets the sysctls. At least I think that would work from reading https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/. I'm only pointing that out since I came across that as an option, I'm not suggesting it would be a better option.

Part of the reason I was reading about the sysctls was because I was concerned that we would be exposing node level sysctl configuration if we started using the tuning plugin. But from the first sentence in the CNI Tuning plugin it says

This plugin can change some system controls (sysctls) and several interface attributes (promiscuous mode, all-multicast mode, MTU and MAC address) in the network namespace.

and from what I've read it sounds like most of the net.* sysctls are network namespaced. So also along with the allow list you've got I'm starting to feel comfortable with adding this.

I think high level I'm not sure about the location in the Installation resource for the new field. Recently we've been chatting (internally) about OS specific configuration and I think sysctl tuning would fall into that category, so we might need to provide some direction there. Also I think there should be some validation (which I think should be added here ) on the settings so if a user provided an unsupported setting they would be alerted to that instead of silently ignoring the unsupported value.

@Tamas-Biro1
Copy link
Contributor Author

I guess another option instead of this would be to use a mutating web hook to modify all (non-hostNetworked) pods with a securityContext that sets the sysctls. At least I think that would work from reading https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/. I'm only pointing that out since I came across that as an option, I'm not suggesting it would be a better option.

Part of the reason I was reading about the sysctls was because I was concerned that we would be exposing node level sysctl configuration if we started using the tuning plugin. But from the first sentence in the CNI Tuning plugin it says

This plugin can change some system controls (sysctls) and several interface attributes (promiscuous mode, all-multicast mode, MTU and MAC address) in the network namespace.

and from what I've read it sounds like most of the net.* sysctls are network namespaced. So also along with the allow list you've got I'm starting to feel comfortable with adding this.

I think high level I'm not sure about the location in the Installation resource for the new field. Recently we've been chatting (internally) about OS specific configuration and I think sysctl tuning would fall into that category, so we might need to provide some direction there. Also I think there should be some validation (which I think should be added here ) on the settings so if a user provided an unsupported setting they would be alerted to that instead of silently ignoring the unsupported value.

hey @tmjd ,

Thanks for you detailed response. To be honest, webhooks could put more complexity to our system in addition that would be more error prone. So based in this we'd better prefer to stay with sysctl way.

On the other hand I totally agree with your concerns on exposing node level sysctls. So I've made some changes on this PR to restrict setting sysctls that are not allowed and throw an alert to user, as you suggested. Please have a look at it, and also if you have different direction on the location of this setting in Installation, please let me know. Thank you!

@danudey danudey modified the milestones: v1.31.0, v1.31.1, v1.31.2 Aug 31, 2023
@abasitt
Copy link

abasitt commented Oct 3, 2023

+1 please add net.* to the allowed list.

@Tamas-Biro1 Tamas-Biro1 force-pushed the tuning-plugin-configuration branch 3 times, most recently from 0ac43e5 to ba4234c Compare October 4, 2023 11:40
@danudey danudey modified the milestones: v1.31.2, v1.31.3 Oct 12, 2023
@rtheis
Copy link

rtheis commented Nov 7, 2023

@Tamas-Biro1 ping

pkg/controller/installation/validation.go Outdated Show resolved Hide resolved
pkg/controller/migration/convert/network.go Outdated Show resolved Hide resolved
@tmjd
Copy link
Member

tmjd commented Nov 14, 2023

/sem-approve

@tmjd
Copy link
Member

tmjd commented Nov 15, 2023

/sem-approve

pkg/controller/migration/convert/network_test.go Outdated Show resolved Hide resolved
pkg/controller/migration/convert/network_test.go Outdated Show resolved Hide resolved
pkg/controller/installation/validation.go Outdated Show resolved Hide resolved
@tmjd
Copy link
Member

tmjd commented Nov 22, 2023

/sem-approve

pkg/controller/migration/convert/network.go Outdated Show resolved Hide resolved
@fasaxc
Copy link
Member

fasaxc commented Nov 23, 2023

/sem-approve

@fasaxc
Copy link
Member

fasaxc commented Nov 23, 2023

@Tamas-Biro1 there was a failure in a unit test; I think one that you added.

@Tamas-Biro1
Copy link
Contributor Author

@Tamas-Biro1 there was a failure in a unit test; I think one that you added.

thanks, could you help me which unit test? Or could you provide a doc how to run unit tests locally? Thank you!

@fasaxc
Copy link
Member

fasaxc commented Nov 27, 2023

Here's the failure:

�[91m�[1m• Failure [0.001 seconds]�[0m
Convert network tests
�[90m/go/src/github.com/tigera/operator/pkg/controller/migration/convert/network_test.go:43�[0m
  handle Calico CNI migration
  �[90m/go/src/github.com/tigera/operator/pkg/controller/migration/convert/network_test.go:105�[0m
    Calico CNI config flags
    �[90m/go/src/github.com/tigera/operator/pkg/controller/migration/convert/network_test.go:757�[0m
      migrate tuning setting
      �[90m/go/src/github.com/tigera/operator/pkg/controller/migration/convert/network_test.go:758�[0m
        �[91m�[1mnot allowed sysctl tuning in config [It]�[0m
        �[90m/go/src/github.com/tigera/operator/pkg/controller/migration/convert/network_test.go:823�[0m

        �[91mExpected an error to have occurred.  Got:
            <nil>: nil�[0m

        /go/src/github.com/tigera/operator/pkg/controller/migration/convert/network_test.go:861

        �[91mFull Stack Trace�[0m
        github.com/tigera/operator/pkg/controller/migration/convert.glob..func8.3.19.1.2()
        	/go/src/github.com/tigera/operator/pkg/controller/migration/convert/network_test.go:861 +0x43a
        github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0006d55f0?)
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0x9d
        github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0x14b1642?)
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0xd8
        github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc0006d5808?)
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x5b
        github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc000752b40, 0xc0006d5990?, {0x1be86c0, 0xc00031d1c0})
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x2a9
        github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc000752b40, {0x1be86c0, 0xc00031d1c0})
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/spec/spec.go:138 +0xe5
        github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0000fc420, 0xc000752b40)
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0xe7
        github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0000fc420)
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x1a5
        github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0000fc420)
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0xb7
        github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0003418f0, {0x7efc607862e8, 0xc000107860}, {0x19a5667, 0x2e}, {0xc0000e3240, 0x2, 0x2}, {0x1c02da0, 0xc00031d1c0}, ...)
        	/go/pkg/mod/github.com/onsi/[email protected]/internal/suite/suite.go:79 +0x5ca
        github.com/onsi/ginkgo.runSpecsWithCustomReporters({0x1be8000?, 0xc000107860}, {0x19a5667, 0x2e}, {0xc0000e3220, 0x2, 0x198054a?})
        	/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:245 +0x20c
        github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters({0x1be8000, 0xc000107860}, {0x19a5667, 0x2e}, {0xc000072f08, 0x1c69e35?, 0x1})
        	/go/pkg/mod/github.com/onsi/[email protected]/ginkgo_dsl.go:228 +0x19b
        github.com/tigera/operator/pkg/controller/migration/convert_test.TestConverter(0x0?)
        	/go/src/github.com/tigera/operator/pkg/controller/migration/convert/convert_suite_test.go:32 +0x17e
        testing.tRunner(0xc000107860, 0x1a6f3f0)
        	/usr/local/go/src/testing/testing.go:1595 +0xff
        created by testing.(*T).Run in goroutine 1
        	/usr/local/go/src/testing/testing.go:1648 +0x3ad
�[90m------------------------------�[0m

You should be able to run

make ut UT_DIR=./pkg/controller/migration/convert/

to run the test in a container (as it runs in CI). I think go test ./pkg/controller/migration/convert/ will work too if your go version is in sync with ours.

@fasaxc
Copy link
Member

fasaxc commented Nov 28, 2023

/sem-approve

@tmjd
Copy link
Member

tmjd commented Nov 28, 2023

/sem-approve

Copy link
Member

@tmjd tmjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tmjd tmjd merged commit 507ebc5 into tigera:master Dec 7, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configure cni_network_config 'tuning' parameters
9 participants