Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-JIRA: e2e:sriov: deploy sriov devices as part of config suite #981

Closed
wants to merge 3 commits into from

Conversation

Tal-or
Copy link
Contributor

@Tal-or Tal-or commented Mar 4, 2024

As part of test coverage enhancement, we want to integrate
the performance tests (functests) on a cluster which has sriov devices.

For that we want to detect whether the cluster supports sriov and act as follow:
If it's not supported or the test failed to create sriov devices, skips the setup and run test as usual.
This meant to keep the existing behavior intact.

If the cluster supports sriov, create some devices and run all the tests.

As part of test coverage enhancement, we want to integrate
the performance tests (functests) on a cluster which has sriov devices.

For that we want to detect whether the cluster supports sriov and act as follow:
If it's not supported or the test failed to create sriov devices, skips the setup and run test as usual.
This meant to keep the existing behavior intact.

If the cluster supports sriov, create some devices and run all the tests.

Signed-off-by: Talor Itzhak <[email protected]>
We can control the run of the SRI-OV setup using ginkgo labels.

Signed-off-by: Talor Itzhak <[email protected]>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 4, 2024
@openshift-ci-robot
Copy link
Contributor

@Tal-or: This pull request explicitly references no jira issue.

In response to this:

As part of test coverage enhancement, we want to integrate
the performance tests (functests) on a cluster which has sriov devices.

For that we want to detect whether the cluster supports sriov and act as follow:
If it's not supported or the test failed to create sriov devices, skips the setup and run test as usual.
This meant to keep the existing behavior intact.

If the cluster supports sriov, create some devices and run all the tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Tal-or
Copy link
Contributor Author

Tal-or commented Mar 4, 2024

/cc @SchSeba PTAL

@openshift-ci openshift-ci bot requested a review from SchSeba March 4, 2024 16:12
Copy link
Contributor

openshift-ci bot commented Mar 4, 2024

@Tal-or: GitHub didn't allow me to request PR reviews from the following users: PTAL.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @SchSeba PTAL

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Tal-or
Copy link
Contributor Author

Tal-or commented Mar 4, 2024

/test e2e-telco5g-cnftests

Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sold about setting up SR-IOV devices inside the tests (vs in the lane setup stage)

Comment on lines +13 to +14
github.com/k8snetworkplumbingwg/network-attachment-definition-client v1.1.1-0.20201119153432-9d213757d22d
github.com/k8snetworkplumbingwg/sriov-network-operator v1.2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a issue when we integrate the tests into cnf-tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean cnf-tests image, we do not integrate tests there anymore apart from the latency suite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, right

@@ -134,6 +135,14 @@ var _ = Describe("[performance][config] Performance configuration", Ordered, fun
format.Object(performanceProfile, 2)
})

It("should create SRI-OV devices if they are exist", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep forgetting 👍

Eventually(func() int64 {
testedNode, err := testclient.K8sClient.CoreV1().Nodes().Get(ctx, node, metav1.GetOptions{})
Expect(err).ToNot(HaveOccurred())
resNum, _ := testedNode.Status.Allocatable[corev1.ResourceName("openshift.io/"+resourceName)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally the discovery function (or one of its helper) should return the full resource name we need to consume. Here we add a component of guesswork

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
I believe the final resource name should be documented somewhere because there are tests that expects to be passed a value of the device name to be executed (example).
and later then we can use this common value in these tests.

createNetwork(sriovDevice, networkName, namespace, OperatorNamespace, resourceName, metaPluginsConfig)
}

func createNetwork(sriovDevice *sriovv1.InterfaceExt, sriovNetworkName, sriovNetworkNamespace, operatorNamespace, resourceName, metaPluginsConfig string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a 1-line wrapper which basically adapts the function signature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied this part verbatim, we can change it, no problem

Copy link
Contributor

@shajmakh shajmakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!
I added few comments below.


func CreatePolicyAndNetwork(ctx context.Context, sriovInfos *sriovcluster.EnabledNodes, namespace, networkName, resourceName, metaPluginsConfig string) {
GinkgoHelper()
numVfs := 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason for creating this specific number?
we have some tests that require 64 vfs, I believe it would be useful to configure this via env variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I copied the function verbatim from cnf-test.
I don't care how much devices there are, even one is enough because all I wanted to check is that rps mask applies on these devices as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely add an env variable to configure the number, but we need to make sure it doesn't get lost in the documentation :)

Eventually(func() int64 {
testedNode, err := testclient.K8sClient.CoreV1().Nodes().Get(ctx, node, metav1.GetOptions{})
Expect(err).ToNot(HaveOccurred())
resNum, _ := testedNode.Status.Allocatable[corev1.ResourceName("openshift.io/"+resourceName)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
I believe the final resource name should be documented somewhere because there are tests that expects to be passed a value of the device name to be executed (example).
and later then we can use this common value in these tests.

@@ -19,6 +19,7 @@ import (
olmv1alpha1 "github.com/operator-framework/api/pkg/operators/v1alpha1"
apiextensionsv1beta1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1"

sriovtestclient "github.com/k8snetworkplumbingwg/sriov-network-operator/test/util/client"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should be in the above import set?

Comment on lines +10 to +21
sriovk8sv1 "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/apis/k8s.cni.cncf.io/v1"
sriovv1 "github.com/k8snetworkplumbingwg/sriov-network-operator/api/v1"
sriovcluster "github.com/k8snetworkplumbingwg/sriov-network-operator/test/util/cluster"
sriovnetwork "github.com/k8snetworkplumbingwg/sriov-network-operator/test/util/network"
testutils "github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils"
testclient "github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/client"
"github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/cluster"
testlog "github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/log"
"github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/utils/nodes"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add empty lines it will help clarify the different imports parties that are used.

@@ -134,6 +135,14 @@ var _ = Describe("[performance][config] Performance configuration", Ordered, fun
format.Object(performanceProfile, 2)
})

It("should create SRI-OV devices if they are exist", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It() creates the devices if they don't exist, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant if the cluster supports it. I'll change that, thanks

@@ -134,6 +135,14 @@ var _ = Describe("[performance][config] Performance configuration", Ordered, fun
format.Object(performanceProfile, 2)
})

It("should create SRI-OV devices if they are exist", func() {
sriovInfos, err := sriovcluster.DiscoverSriov(testclient.SRIOVClient, sriov.OperatorNamespace)
Copy link
Contributor

@shajmakh shajmakh Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I undertand, devices related tests will run if the operator is installed on the sriov.OperatorNamespace, and installing the operator is not part of the tests setup.
this means we need to setup the operator installation in the d/s ci pipelines. We'll check this accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I also know that on cnf-test they are installing the operator before the test starts:
https://github.com/openshift-kni/cnf-features-deploy/blob/master/hack/setup-build-index-image.sh#L62-L66

@Tal-or
Copy link
Contributor Author

Tal-or commented Mar 5, 2024

/hold
PR design is in ongoing discussion

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 5, 2024
Copy link
Contributor

openshift-ci bot commented Mar 21, 2024

@Tal-or: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-techpreview b0ecfad link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-operator b0ecfad link true /test e2e-aws-operator
ci/prow/e2e-telco5g-cnftests b0ecfad link false /test e2e-telco5g-cnftests
ci/prow/e2e-aws-ovn b0ecfad link true /test e2e-aws-ovn
ci/prow/e2e-gcp-pao b0ecfad link true /test e2e-gcp-pao
ci/prow/e2e-upgrade b0ecfad link true /test e2e-upgrade
ci/prow/vet b0ecfad link true /test vet
ci/prow/e2e-gcp-pao-workloadhints b0ecfad link true /test e2e-gcp-pao-workloadhints
ci/prow/e2e-hypershift b0ecfad link true /test e2e-hypershift
ci/prow/e2e-gcp-pao-updating-profile b0ecfad link true /test e2e-gcp-pao-updating-profile
ci/prow/e2e-hypershift-pao b0ecfad link true /test e2e-hypershift-pao

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Tal-or
Copy link
Contributor Author

Tal-or commented Apr 1, 2024

/close
superseded by openshift-kni/cnf-features-deploy#1856

@openshift-ci openshift-ci bot closed this Apr 1, 2024
Copy link
Contributor

openshift-ci bot commented Apr 1, 2024

@Tal-or: Closed this PR.

In response to this:

/close
superseded by openshift-kni/cnf-features-deploy#1856

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants