Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e test case "TestFQDNCacheMinTTL" was failed on Windows testbed #6891

Open
wenyingd opened this issue Dec 31, 2024 · 5 comments
Open

e2e test case "TestFQDNCacheMinTTL" was failed on Windows testbed #6891

wenyingd opened this issue Dec 31, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@wenyingd
Copy link
Contributor

Describe the bug

We observed that the e2e test case "TestFQDNCacheMinTTL" may block Windows e2e test, because the Pod "custom-dns-server" defined in the test uses image "coredns/coredns:1.11.3" which only supports Linux Node, but the test case doesn't set the correct Node selector for it. So if the Pod is scheduled on Windows Node, the case will fail because of missing the dependent images.

Below logs were observed,

=== RUN   TestFQDNCacheMinTTL/minTTLUnset
2024/12/30 14:06:30 Applying Antrea YAML
2024/12/30 14:06:33 Waiting for all Antrea DaemonSet Pods
2024/12/30 14:06:34 Checking CoreDNS deployment
    fixtures.go:286: Creating 'testfqdncacheminttl-minttlunset-3tz7nfae' K8s Namespace
    antreapolicy_test.go:5533: 
        	Error Trace:	/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/test/e2e/antreapolicy_test.go:5533
        	            				/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/test/e2e/antreapolicy_test.go:5357
        	            				/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/test/e2e/antreapolicy_test.go:5306
        	Error:      	Received unexpected error:
        	            	timed out waiting for the condition, Pod.Status: &PodStatus{Phase:Pending,Conditions:[]PodCondition{PodCondition{Type:PodReadyToStartContainers,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:08:43 +0000 UTC,Reason:,Message:,},PodCondition{Type:Initialized,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:08:33 +0000 UTC,Reason:,Message:,},PodCondition{Type:Ready,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:08:33 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [coredns],},PodCondition{Type:ContainersReady,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:08:33 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [coredns],},PodCondition{Type:PodScheduled,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:08:33 +0000 UTC,Reason:,Message:,},},Message:,Reason:,HostIP:10.164.243.248,PodIP:192.168.2.3,StartTime:2024-12-30 14:08:33 +0000 UTC,ContainerStatuses:[]ContainerStatus{ContainerStatus{Name:coredns,State:ContainerState{Waiting:&ContainerStateWaiting{Reason:ImagePullBackOff,Message:Back-off pulling image "coredns/coredns:1.11.3",},Running:nil,Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:coredns/coredns:1.11.3,ImageID:,ContainerID:,Started:*false,AllocatedResources:ResourceList{},Resources:nil,VolumeMounts:[]VolumeMountStatus{VolumeMountStatus{Name:config-volume,MountPath:/etc/coredns,ReadOnly:false,RecursiveReadOnly:nil,},VolumeMountStatus{Name:kube-api-access-bgt7c,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,ReadOnly:true,RecursiveReadOnly:*Disabled,},},User:nil,AllocatedResourcesStatus:[]ResourceStatus{},},},QOSClass:BestEffort,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{PodIP{IP:192.168.2.3,},},EphemeralContainerStatuses:[]ContainerStatus{},Resize:,ResourceClaimStatuses:[]PodResourceClaimStatus{},HostIPs:[]HostIP{HostIP{IP:10.164.243.248,},},}
        	Test:       	TestFQDNCacheMinTTL/minTTLUnset
    fixtures.go:353: Exporting test logs to '/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/antrea-test-logs/TestFQDNCacheMinTTL_minTTLUnset/beforeTeardown.Dec30-14-10-03'
    fixtures.go:465: Error when exporting kubelet logs: error when running journalctl on Node 'a-tapmw-1', is it available? Error: <nil>
    fixtures.go:524: Deleting 'testfqdncacheminttl-minttlunset-3tz7nfae' K8s Namespace
I1230 14:10:09.389525 2173866 framework.go:860] Deleting Namespace testfqdncacheminttl-minttlunset-3tz7nfae took 5.031681ms
=== RUN   TestFQDNCacheMinTTL/minTTL20s
2024/12/30 14:10:09 Applying Antrea YAML
2024/12/30 14:10:11 Waiting for all Antrea DaemonSet Pods
2024/12/30 14:10:12 Checking CoreDNS deployment
    fixtures.go:286: Creating 'testfqdncacheminttl-minttl20s-3a4k9tkr' K8s Namespace
    antreapolicy_test.go:5533: 
        	Error Trace:	/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/test/e2e/antreapolicy_test.go:5533
        	            				/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/test/e2e/antreapolicy_test.go:5357
        	            				/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/test/e2e/antreapolicy_test.go:5307
        	Error:      	Received unexpected error:
        	            	timed out waiting for the condition, Pod.Status: &PodStatus{Phase:Pending,Conditions:[]PodCondition{PodCondition{Type:PodReadyToStartContainers,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:10:27 +0000 UTC,Reason:,Message:,},PodCondition{Type:Initialized,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:10:19 +0000 UTC,Reason:,Message:,},PodCondition{Type:Ready,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:10:19 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [coredns],},PodCondition{Type:ContainersReady,Status:False,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:10:19 +0000 UTC,Reason:ContainersNotReady,Message:containers with unready status: [coredns],},PodCondition{Type:PodScheduled,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2024-12-30 14:10:19 +0000 UTC,Reason:,Message:,},},Message:,Reason:,HostIP:10.164.243.229,PodIP:192.168.3.4,StartTime:2024-12-30 14:10:19 +0000 UTC,ContainerStatuses:[]ContainerStatus{ContainerStatus{Name:coredns,State:ContainerState{Waiting:&ContainerStateWaiting{Reason:ImagePullBackOff,Message:Back-off pulling image "coredns/coredns:1.11.3",},Running:nil,Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:coredns/coredns:1.11.3,ImageID:,ContainerID:,Started:*false,AllocatedResources:ResourceList{},Resources:nil,VolumeMounts:[]VolumeMountStatus{VolumeMountStatus{Name:config-volume,MountPath:/etc/coredns,ReadOnly:false,RecursiveReadOnly:nil,},VolumeMountStatus{Name:kube-api-access-bh4rl,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,ReadOnly:true,RecursiveReadOnly:*Disabled,},},User:nil,AllocatedResourcesStatus:[]ResourceStatus{},},},QOSClass:BestEffort,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{PodIP{IP:192.168.3.4,},},EphemeralContainerStatuses:[]ContainerStatus{},Resize:,ResourceClaimStatuses:[]PodResourceClaimStatus{},HostIPs:[]HostIP{HostIP{IP:10.164.243.229,},},}
        	Test:       	TestFQDNCacheMinTTL/minTTL20s
    fixtures.go:353: Exporting test logs to '/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/antrea-test-logs/TestFQDNCacheMinTTL_minTTL20s/beforeTeardown.Dec30-14-11-49'
    fixtures.go:465: Error when exporting kubelet logs: error when running journalctl on Node 'a-tapmw-1', is it available? Error: <nil>
    fixtures.go:524: Deleting 'testfqdncacheminttl-minttl20s-3a4k9tkr' K8s Namespace
I1230 14:11:55.871920 2173866 framework.go:860] Deleting Namespace testfqdncacheminttl-minttl20s-3a4k9tkr took 3.859203ms

links: http://10.164.243.223/view/Windows/job/antrea-windows-e2e-for-pull-request/56/console

To Reproduce

Expected

Actual behavior

Versions:

Additional context

@wenyingd wenyingd added the kind/bug Categorizes issue or PR as related to a bug. label Dec 31, 2024
@devc007
Copy link

devc007 commented Jan 1, 2025

"Hi @wenyingd -
I believe I've found a solution to this issue. By explicitly setting a Node selector, we can resolve the problem. I'm proposing to add WithNodeSelector(map[string]string{"kubernetes.io/os": "linux"}) in the antrea/test/ec2/antrea_test.go file between lines 5531 and 5532. I wanted to discuss this approach before submitting a PR. Please let me know your thoughts on this solution."

@antoninbas
Copy link
Contributor

Considering that all of TestAntreaPolicy is skipped when the cluster has at least once Windows Node, it would make sense to do the same for TestFQDNCacheMinTTL:

skipIfHasWindowsNodes(t)

@devc007 Feel free to submit a PR with this change ^. While adding a selector is a possible solution, other issues may prevent this test from running on Windows. In order to stay consistent with other Antrea policy tests, it makes sense to disable the test for now.

@devc007
Copy link

devc007 commented Jan 3, 2025

Thank you, @antoninbas. I plan to add skipIfHasWindowsNodes(t) between lines 5317 and 5320. However, before proceeding, I want to raise a concern. One of my recurring challenges has been that most of my PRs get blocked due to CI check failures. So, when I was contibuting in jenkins, we need to build the code locally before raising a PR. For this specific case, should I build or test anything locally before submitting the PR?

@antoninbas
Copy link
Contributor

@devc007 It usually always makes sense to run make golangci before pushing anything.

Based on what I am working on, I usually also run the following:

  • unit tests for the packages I am modifying (using the appropriate go test command directly)
  • some specific integration or e2e tests

In this specific case, I would just run make golangci locally before pushing. We will trigger the appropriate Windows CI job manually after you create the PR, to make sure that your change indeed addresses the issue.

One of my recurring challenges has been that most of my PRs get blocked due to CI check failures.

Which PRs? I don't see any PR from you in this repository.

@devc007
Copy link

devc007 commented Jan 9, 2025

Please review #6913. Apologies for being offline; I was facing issues with my laptop, which caused a lack of communication. My apologies again, @antoninbas, for not communicating properly earlier. The PR I mentioned is in a different repository, but most PRs there are blocked due to CI check failures. #6913 is my first PR to antrea.

devc007 added a commit to devc007/antrea that referenced this issue Jan 11, 2025
…o#6891 The TestFQDNCacheMinTTL e2e test currently does not support Windows. We skip it if any Node in the test cluster is a Windows Node, which is also consistent with other AntreaPolicy e2e tests.

As @antoninbas suggested, I have relocated the skipX condition to the beginning of TestFQDNCacheMinTTL. Now, the skip condition will execute before any subset of TestFQDNCacheMinTTL, making testWithFQDNCacheMinTTL shorter and possibly more efficient.
devc007 added a commit to devc007/antrea that referenced this issue Jan 13, 2025
Skip TestFQDNCacheMinTTL if cluster has Windows Nodes

The TestFQDNCacheMinTTL e2e test currently does not support Windows.
We skip it if any Node in the test cluster is a Windows Node, which is also
consistent with other AntreaPolicy e2e tests.

Skip TestFQDNCacheMinTTL if cluster has Windows Nodes. Fixes antrea-io#6891

Skip TestFQDNCacheMinTTL if cluster has Windows Nodes. Fixes antrea-io#6891 The TestFQDNCacheMinTTL e2e test currently does not support Windows. We skip it if any Node in the test cluster is a Windows Node, which is also consistent with other AntreaPolicy e2e tests.

As @antoninbas suggested, I have relocated the skipX condition to the beginning of TestFQDNCacheMinTTL. Now, the skip condition will execute before any subset of TestFQDNCacheMinTTL, making testWithFQDNCacheMinTTL shorter and possibly more efficient.

remove an line from antreapolicy_test.go
devc007 added a commit to devc007/antrea that referenced this issue Jan 14, 2025
…o#6891

The TestFQDNCacheMinTTL e2e test currently does not support Windows.
We skip it if any Node in the test cluster is a Windows Node, which is also
consistent with other AntreaPolicy e2e tests.
devc007 added a commit to devc007/antrea that referenced this issue Jan 15, 2025
…o#6891

The TestFQDNCacheMinTTL e2e test currently does not support Windows.
We skip it if any Node in the test cluster is a Windows Node, which is also
consistent with other AntreaPolicy e2e tests.

Signed-off-by: devesh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants