Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/e2e: Remove kata-containers test #299

Merged

Conversation

stevenhorsman
Copy link
Member

Remove the kata-containers CCv0 e2e tests in the operator as these tests are on a dead branch of a dead repo We also have had issues with them not being suitable as they programmatically change the configuration, which isn't likely to work in GHA, but isn't what we want to test either.

Later we should add an e2e smoke test that ensures that the kata-payload integrated with things like nydus-snapshotter than the operator is responsible for setting up and configuring as discussed in #239

@stevenhorsman
Copy link
Member Author

/test

@wainersm
Copy link
Member

Thanks @stevenhorsman !

Need to remove the call to clone_kata_tests in c1fd569#diff-33bd1d8cc1a7b67275bc3a8a02470ba638f978798df7501e70ec085649aa79e6R69

@stevenhorsman
Copy link
Member Author

/test

@stevenhorsman
Copy link
Member Author

/test

@wainersm
Copy link
Member

@stevenhorsman removing below code I think we are ready to merge:

--- a/tests/e2e/tests_runner.sh
+++ b/tests/e2e/tests_runner.sh
@@ -39,15 +39,8 @@ usage() {
 # tests for CC without specific hardware support
 run_non_tee_tests() {
        local runtimeclass="$1"
-       local aa_kbc="${2:-"offline_fs_kbc"}"
-
-       # This will be extended further to export differently based on a type of runtimeclass.
-       # At the time of writing, it is assumed that all non-tee tests use offline_fs_kbc.
-       # Discussion: https://github.com/confidential-containers/operator/pull/142#issuecomment-1359349595
-       export AA_KBC="${aa_kbc}"
 
        bats "${script_dir}/operator_tests.bats"
-
 }
 
 # Tests for CC with QEMU on SEV HW
@@ -70,7 +63,7 @@ main() {
        # Run tests.
        case $runtimeclass in
                kata-qemu|kata-clh|kata-clh-tdx)
-                       echo "INFO: Running non-TEE tests for $runtimeclass using OfflineFS KBC"
+                       echo "INFO: Running non-TEE tests for $runtimeclass"
                        run_non_tee_tests "$runtimeclass"
                        ;;
                kata-qemu-se)

@stevenhorsman
Copy link
Member Author

@stevenhorsman removing below code I think we are ready to merge:

Good idea to remove some more code. Updated and added your co-authorship for the suggestions.

@stevenhorsman
Copy link
Member Author

/test

@stevenhorsman
Copy link
Member Author

Working locally now as well:

INFO: Run tests
INFO: Running non-TEE tests for kata-qemu
operator_tests.bats
 ✓ [cc][operator] Test can uninstall the operator
 ✓ [cc][operator] Test can reinstall the operator

2 tests, 0 failures

@stevenhorsman stevenhorsman mentioned this pull request Nov 30, 2023
Copy link
Member

@wainersm wainersm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks once again @stevenhorsman !

Remove the kata-containers `CCv0` e2e tests in the operator
as these tests are on a dead branch of a dead repo
We also have had issues with them not being suitable as they
programmatically change the configuration, which isn't likely to work
in GHA, but isn't what we want to test either.

Later we should add an e2e smoke test that ensures that the kata-payload
integrated with things like nydus-snapshotter than the operator
is responsible for setting up and configuring as discussed in
confidential-containers#239

Collapse all the runtime classes to do the same tests for now
and then when we add the smoke test we can make that have the
smarts to handle all the platforms

Signed-off-by: stevenhorsman <[email protected]>
Co-authored-by: Wainer dos Santos Moschetta <[email protected]>
@stevenhorsman
Copy link
Member Author

/test

@stevenhorsman
Copy link
Member Author

@BbolroC - hey Choi, we're seeing failures with the s390x tests here (install works, but the uninstall fails during the install step it looks like), do you know if they should be reliable, or is this an known issue?

@stevenhorsman
Copy link
Member Author

FYI the kata-qemu job is disabled: http://jenkins.katacontainers.io/view/all/job/confidential-containers-operator-main-ubuntu-22.04-x86_64-containerd_kata-qemu-PR/, so I will remove that from being required status check

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@BbolroC
Copy link
Member

BbolroC commented Nov 30, 2023

@BbolroC - hey Choi, we're seeing failures with the s390x tests here (install works, but the uninstall fails during the install step it looks like), do you know if they should be reliable, or is this an known issue?

It should be reliable as we can see a stable test result for the SE-enabled cc-operator nightly test at http://jenkins.katacontainers.io/job/confidential-containers-operator-main-ubuntu-20.04-s390x-SE-daily/. I will re-run the test and see what happened.

@stevenhorsman
Copy link
Member Author

stevenhorsman commented Dec 1, 2023

Just an update here - I've tested the behaviour on a z VSI and see the same result as the tests, where cc-operator-pre-install-daemon pod is not running in the second test, so it seems like this is a valid failure and weirdly if I run the main branch which just has extra tests then it passes:

ok 12 [cc][agent][kubernetes][containerd] Test cannot pull an image from an authenticated registry without credentials
ok 13 [cc][agent][kubernetes][containerd] Test can pull an encrypted image inside the guest with decryption key
ok 14 [cc][agent][kubernetes][containerd] Test cannot pull an encrypted image inside the guest without decryption key
ok 15 [cc][operator] Test can uninstall the operator
ok 16 [cc][operator] Test can reinstall the operator

so it seems to be related to this PR, but I can't explain why. I'll try and work with Choi (who is on vacation today) on Monday to try and get to the bottom of it, unless anyone else has any ideas?

@stevenhorsman
Copy link
Member Author

so it seems to be related to this PR, but I can't explain why. I'll try and work with Choi (who is on vacation today) on Monday to try and get to the bottom of it, unless anyone else has any ideas?

Since switching to the main branch for testing I've done a bunch of debug and everything kept working locally, to the point that I'm back testing against this source branch and it is now passing 😕

@wainersm
Copy link
Member

wainersm commented Dec 1, 2023

hi @stevenhorsman !

so it seems to be related to this PR, but I can't explain why. I'll try and work with Choi (who is on vacation today) on Monday to try and get to the bottom of it, unless anyone else has any ideas?

Since switching to the main branch for testing I've done a bunch of debug and everything kept working locally, to the point that I'm back testing against this source branch and it is now passing 😕

I could reproduce in an x86_64 VM. I'm using the CCv0 branch (i.e. I don't have #298)

Looking at the nodes labels, I see cc-preinstall/done=true which explain (I guess) the preinstall pod not being scheduled:

ubuntu@myvm:~/operator/tests/e2e$ sudo -E kubectl describe nodes/$(hostname)
Name:               myvm
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    cc-preinstall/done=true
                    katacontainers.io/kata-runtime=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=myvm
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
                    node.kubernetes.io/worker=
...
...
...

@wainersm
Copy link
Member

wainersm commented Dec 1, 2023

@stevenhorsman when I switch the label to false, I get the pre-install pod running (although failing). However, I have no idea yet when running with kata tests it simply works... no idea...

ubuntu@myvm:~/operator/tests/e2e$ sudo -E kubectl get pods -n confidential-containers-system
NAME                                             READY   STATUS    RESTARTS   AGE
cc-operator-controller-manager-ccbbcfdf7-bq9rx   2/2     Running   0          3h37m
cc-operator-daemon-install-krrng                 1/1     Running   0          3h37m
ubuntu@myvm:~/operator/tests/e2e$ sudo -E kubectl label nodes $(hostname) cc-preinstall/done=false
error: 'cc-preinstall/done' already has a value (true), and --overwrite is false
ubuntu@myvm:~/operator/tests/e2e$ sudo -E kubectl label nodes $(hostname) --overwrite cc-preinstall/done=false
node/myvm labeled
ubuntu@myvm:~/operator/tests/e2e$ sudo -E kubectl get pods -n confidential-containers-system
NAME                                             READY   STATUS    RESTARTS     AGE
cc-operator-controller-manager-ccbbcfdf7-bq9rx   2/2     Running   0            3h38m
cc-operator-daemon-install-krrng                 1/1     Running   0            3h37m
cc-operator-pre-install-daemon-h9csb             0/1     Error     1 (4s ago)   8s
ubuntu@myvm:~/operator/tests/e2e$ sudo -E kubectl get pods -n confidential-containers-system
NAME                                             READY   STATUS    RESTARTS      AGE
cc-operator-controller-manager-ccbbcfdf7-bq9rx   2/2     Running   0             3h38m
cc-operator-daemon-install-krrng                 1/1     Running   0             3h38m
cc-operator-pre-install-daemon-h9csb             0/1     Error     1 (10s ago)   14s
ubuntu@myvm:~/operator/tests/e2e$ sudo -E kubectl describe pods/cc-operator-pre-install-daemon-h9csb -n confidential-containers-system
Name:         cc-operator-pre-install-daemon-h9csb
Namespace:    confidential-containers-system
Priority:     0
Node:         myvm/192.168.122.219
Start Time:   Fri, 01 Dec 2023 18:39:40 +0000
Labels:       controller-revision-hash=84f7757fb
              name=cc-operator-pre-install-daemon
              pod-template-generation=1
Annotations:  <none>
Status:       Running
IP:           10.244.0.9
IPs:
  IP:           10.244.0.9
Controlled By:  DaemonSet/cc-operator-pre-install-daemon
Containers:
  cc-runtime-pre-install-pod:
    Container ID:  containerd://3efdb93b1284419722b67ec5261ee1eb368433a03d5e8fa1379181ffa2d53ea3
    Image:         localhost:5000/reqs-payload
    Image ID:      localhost:5000/reqs-payload@sha256:1d69e07328b90954fd88d67829a446813371ae49c18bcf285d4a27d31720ca31
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      /opt/confidential-containers-pre-install-artifacts/scripts/pre-install.sh
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 01 Dec 2023 18:40:03 +0000
      Finished:     Fri, 01 Dec 2023 18:40:06 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 01 Dec 2023 18:39:44 +0000
      Finished:     Fri, 01 Dec 2023 18:39:47 +0000
    Ready:          False
    Restart Count:  2
    Environment:
      NODE_NAME:                     (v1:spec.nodeName)
      INSTALL_OFFICIAL_CONTAINERD:  false
    Mounts:
      /etc/containerd/ from containerd-conf (rw)
      /etc/systemd/system/ from etc-systemd-system (rw)
      /opt/confidential-containers/ from confidential-containers-artifacts (rw)
      /usr/local/bin/ from local-bin (rw)
      /var/lib/containerd-nydus/ from containerd-nydus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vxkq5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  confidential-containers-artifacts:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/confidential-containers/
    HostPathType:  DirectoryOrCreate
  etc-systemd-system:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/systemd/system/
    HostPathType:  
  containerd-conf:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/containerd/
    HostPathType:  
  local-bin:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/local/bin/
    HostPathType:  
  containerd-nydus:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/containerd-nydus/
    HostPathType:  
  kube-api-access-vxkq5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              node.kubernetes.io/worker=
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  32s               default-scheduler  Successfully assigned confidential-containers-system/cc-operator-pre-install-daemon-h9csb to myvm
  Normal   Pulled     31s               kubelet            Successfully pulled image "localhost:5000/reqs-payload" in 116.238729ms
  Normal   Pulled     28s               kubelet            Successfully pulled image "localhost:5000/reqs-payload" in 136.977628ms
  Normal   Pulling    9s (x3 over 31s)  kubelet            Pulling image "localhost:5000/reqs-payload"
  Normal   Created    9s (x3 over 31s)  kubelet            Created container cc-runtime-pre-install-pod
  Normal   Started    9s (x3 over 31s)  kubelet            Started container cc-runtime-pre-install-pod
  Normal   Pulled     9s                kubelet            Successfully pulled image "localhost:5000/reqs-payload" in 115.467434ms
  Warning  BackOff    6s (x2 over 24s)  kubelet            Back-off restarting failed container

@wainersm
Copy link
Member

wainersm commented Dec 1, 2023

Both uninstall and reinstall tests (see tests/e2e/operator_tests.sh) use the operator.sh to install/uninstall the operator. The install_operator() method has checks to verify the installation succeed, whereas uninstall_operator() doesn't check or wait the uninstall to finish. So my hypothesis is: on some circumstances (e.g. "slow" VMs) after an uninstall it isn't given enough time for the operator to be completely removed at the time the install takes place.

@wainersm
Copy link
Member

wainersm commented Dec 1, 2023

Both uninstall and reinstall tests (see tests/e2e/operator_tests.sh) use the operator.sh to install/uninstall the operator. The install_operator() method has checks to verify the installation succeed, whereas uninstall_operator() doesn't check or wait the uninstall to finish. So my hypothesis is: on some circumstances (e.g. "slow" VMs) after an uninstall it isn't given enough time for the operator to be completely removed at the time the install takes place.

@stevenhorsman could you please cherry-pick the two commits on main...wainersm:cc-operator:operator_tests_debug ? If they don't fix the issue, they at least make the uninstall test better.

When printing all pods on teardown(), the namespace should be
confidential-containers-system (was missing the -system suffix).

While here, added command to print all DaemonSet (DS) on that namespace
because sometimes the DS is created correctly but pods aren't scheduled
for some unmet conditions (e.g. missing 'node.kubernetes.io/worker'
node's labels).

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Added checkers to ensure the operator was uninstalled correctly.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@stevenhorsman
Copy link
Member Author

/test

@stevenhorsman
Copy link
Member Author

Woo! Great job @wainersm - things are working now!

@stevenhorsman stevenhorsman merged commit 4d65952 into confidential-containers:main Dec 4, 2023
5 checks passed
@stevenhorsman stevenhorsman deleted the remove-kata-tests branch December 4, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants