Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-8617 BPF Without Disruption #2921

Closed
wants to merge 1 commit into from
Closed

CORE-8617 BPF Without Disruption #2921

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Oct 10, 2023

Description

Currently, transitioning from iptables to ebpf mode with the operator is disruptive. This is because the operator enables BPF mode using a rolling update of the calico-node daemonset (to add the relevant environment variable). However, clusters with some nodes in iptbales mode and some in BPF mode have broken networking. Therefore, when enabling eBPF mode, all should nodes transition at the same time so that there is minimal iptables/eBPF mode crossover.
Note: [as outlined in epic]
Once roll out is finished, cluster quickly transitions to eBPF mode. Some small disruption may occur as nodes transition within a few seconds of each other.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@marvin-tigera marvin-tigera added this to the v1.32.0 milestone Oct 10, 2023
@ghost ghost changed the title [DO NOT MERGE] CORE-8617 BPF Without Disruption CORE-8617 BPF Without Disruption Oct 17, 2023
@ghost ghost marked this pull request as ready for review October 17, 2023 16:08
@ghost ghost self-requested a review as a code owner October 17, 2023 16:08
@ghost ghost added kind/enhancement New feature or request docs-not-required and removed docs-pr-required labels Oct 17, 2023
@ghost ghost requested review from caseydavenport and tmjd October 19, 2023 14:12
Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StevenTigera Ok, first pass is done!

The majority of the feedback here falls into one of a few categories:

  • Golang coding conventions
  • Better factoring the code for re-use, readability, and ongoing maintenance.
  • Some minor bugs

Let me know if you have any questions at all on this and I'm happy to help work through the comments.

pkg/common/common.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/migration/convert/k8s.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, but they are mostly small. This is looking much better!

I haven't looked at the test code in detail yet, that will be my next pass.

pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/bpf.go Outdated Show resolved Hide resolved
pkg/controller/installation/core_controller.go Outdated Show resolved Hide resolved
pkg/render/node.go Outdated Show resolved Hide resolved
pkg/render/node.go Outdated Show resolved Hide resolved
pkg/render/node.go Outdated Show resolved Hide resolved
pkg/common/common.go Outdated Show resolved Hide resolved
api/v1/installation_types.go Outdated Show resolved Hide resolved
pkg/common/common.go Outdated Show resolved Hide resolved
pkg/render/node.go Outdated Show resolved Hide resolved
Expect(c.Create(ctx, fc)).NotTo(HaveOccurred())

// Act.
_, err := utils.PatchFelixConfiguration(ctx, r.client, func(fc *crdv1.FelixConfiguration) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this isn't quite what I was imagining for this test. It's not wrong per-se, but it's doing more than it needs to do.

This test should probably be trimmed down to this:

fc := v3.FelixConfiguration{}
ds := DaemonSet{}
shouldUpdate := r.setDefaultsOnFelixConfiguration(cr, ds, fc, reqLogger)
Expect(shouldUpdate).To(Equal(true))
Expect(fc.Spec.BPFEnabled).To(Equal(true))

Basically, the tests in this file should be as focused on testing the specific functions in bpf.go.

UTs for utils.PatchFelixConfiguration should exist, but I won't ask you to add them in this PR. Really we want a separate suite of tests for that function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once it's turned into that, it becomes much more clear that this test is actually just testing setDefaultsOnFelixConfiguration, which should have its own Context() for a complete suite of tests (but it doesn't right now)

})

func createInstallation(c client.Client, ctx context.Context, dp operator.LinuxDataplaneOption) *operator.Installation {
ca, err := tls.MakeCA("test")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this CA actually needed? This code doesn't have anything to do with the CertificateManagement feature.

Expect(fc.Spec.BPFEnabled).To(BeNil())
})

It("should query calico-node DS in BPF dataplane and if DS status rolling out then verify rollout not complete", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is also largely testing PatchFelixConfiguration - if we just want to test setBPFUpdatesOnFelixConfiguration here, we don't need to create the installation, we don't need to create the DS, we don't need to create the FelixConfig (in fact the fake client isn't needed at all).

Like I implied in the other comment, testing for PatchFelixConfiguration belongs in the utils package. The tests here should primarily be around the functions in this package.

Expect(fc.Annotations[render.BPFOperatorAnnotation]).To(Equal("false"))
})

It("should query calico-node DS in Iptables dataplane and steer Felix Config when bpfEnabled false", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "steer" means in this context as opposed to the other tests.

Expect(fc.Annotations[render.BPFOperatorAnnotation]).To(Equal("true"))
})

It("should query FelixConfig annotation is nil and spec is nil then outcome is valid", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend splitting the different functions under test into separate contexts or describes. "outcome is valid" isn't very clear unless the higher level context explains which function is being test.

Context("updateBPFEnabledALlowed")
Context("setDefaultsOnFelixConfiguration") // This one probably belongs in core_controller_test.go
Context("setBPFUpdatesOnFelixConfiguration")

Copy link
Member

@caseydavenport caseydavenport Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare that to how it's laid out currently, where the Describe() and the Context() are essentially duplicate versions of "test BPF upgrade without disruption"

var _ = Describe("Testing BPF Upgrade without disruption during core-controller installation", func() {

...

	Context("Reconcile tests BPF Upgrade without disruption", func() {

if (fc.Spec.BPFEnabled == nil || !(*fc.Spec.BPFEnabled)) && isRolloutComplete(ds) {
err := setBPFEnabled(fc, bpfEnabledOnInstall)
if err != nil {
reqLogger.Error(err, "Unable to enable eBPF data plane")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think we need to propagate this error back up to utils.PatchFelixConfiguration. It means a slight tweak to the patchFn definition

@@ -347,6 +350,33 @@ func installResourceCRD(c client.Client, mgr manager.Manager, ctx context.Contex
err := c.Create(context.Background(), instance)
Expect(err).NotTo(HaveOccurred())

ns := &corev1.Namespace{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this change. This function is meant to be a helper for installing the Installation CR - why does it also need to create the DaemonSet now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to revert this - it's a blanket change for all of the existing tests that AFAICT isn't really appropriate.

If you want to run an FV for this case, we need to have a separate test case targeted at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants