Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: delete acr and recreate if cache rule is wrong #5354

Merged
merged 11 commits into from
Dec 4, 2024

add debugging statement

665e82c
Select commit
Loading
Failed to load commit list.
Merged

feat: delete acr and recreate if cache rule is wrong #5354

add debugging statement
665e82c
Select commit
Loading
Failed to load commit list.
Azure Pipelines / Agentbaker E2E succeeded Dec 4, 2024 in 12m 25s

Build #20241204.33 had test failures

Details

Tests

  • Failed: 4 (3.39%)
  • Passed: 114 (96.61%)
  • Other: 0 (0.00%)
  • Total: 118

Annotations

Check failure on line 1 in Test_AzureLinuxV2_GPUAzureCNI

See this annotation in the file changed.

@azure-pipelines azure-pipelines / Agentbaker E2E

Test_AzureLinuxV2_GPUAzureCNI

Failed
Raw output
    scenario_helpers_test.go:169: running scenario vhd: "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/AzureLinuxV2gen2/versions/1.1733166545.3214", tags {Name:Test_AzureLinuxV2_GPUAzureCNI ImageName:AzureLinuxV2gen2 OS:azurelinux Arch:amd64 Airgap:false GPU:true WASM:false ServerTLSBootstrapping:false Scriptless:false KubeletCustomConfig:false}
    vmss.go:39: creating VMSS "2024-12-04-7sed-azurelinuxv2gpuazurecni" in resource group "MC_abe2e-westus3_abe2e-azure-network-1bef8_westus3"
    types.go:160: using "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/AzureLinuxV2gen2/versions/1.1733166545.3214" for VHD
    scenario_helpers_test.go:125: vmss 2024-12-04-7sed-azurelinuxv2gpuazurecni creation succeeded
    pollers.go:29: waiting for node 2024-12-04-7sed-azurelinuxv2gpuazurecni to be ready
    pollers.go:57: node 2024-12-04-7sed-azurelinuxv2gpuazurecni000000 is ready
    scenario_helpers_test.go:128: node 2024-12-04-7sed-azurelinuxv2gpuazurecni is ready
    pod.go:65: creating pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-test-pod"
    pod.go:78: pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-test-pod" is ready
    validation.go:21: node health validation: test pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-test-pod" is running on node "2024-12-04-7sed-azurelinuxv2gpuazurecni000000"
    validators.go:271: validating pod using nvidia GPU
    pod.go:63: truncated pod name to "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-enable-nvidia-dev"
    pod.go:65: creating pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-enable-nvidia-dev"
    pod.go:78: pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-enable-nvidia-dev" is ready
    validators.go:296: resource "nvidia.com/gpu" is available
    pod.go:63: truncated pod name to "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po"
    pod.go:65: creating pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po"
    pollers.go:120: -- pod metadata --
    pollers.go:121:    Name: 
                       Namespace: 
                       Node: 
                       Status: 
                       Start Time: <nil>
    pollers.go:126: -- container(s) info --
    pollers.go:130: -- pod events --
    pollers.go:77: time before timeout: 8m43.35630677s
        
    pollers.go:88: pod 2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po not found yet. Err pods "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po" not found
    pollers.go:120: -- pod metadata --
    pollers.go:121:    Name: 
                       Namespace: 
                       Node: 
                       Status: 
                       Start Time: <nil>
    pollers.go:126: -- container(s) info --
    pollers.go:130: -- pod events --
    pollers.go:77: time before timeout: 3m43.358003274s
        
    pollers.go:88: pod 2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po not found yet. Err pods "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po" not found
    pod.go:78: pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po" is ready
    pod.go:79: 
        	Error Trace:	/mnt/vss/_work/1/s/e2e/pod.go:79
        	            				/mnt/vss/_work/1/s/e2e/validators.go:277
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:220
        	            				/mnt/vss/_work/1/s/e2e/scenario_helpers_test.go:177
        	            				/mnt/vss/_work/1/s/e2e/scenario_helpers_test.go:96
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:200
        	Error:      	Received unexpected error:
        	            	context deadline exceeded
        	Test:       	Test_AzureLinuxV2_GPUAzureCNI
        	Messages:   	failed to wait for pod "2024-12-04-7sed-azurelinuxv2gpuazurecni000000-gpu-validation-po" to 

Check failure on line 1 in Test_Ubuntu2204_GPUNC

See this annotation in the file changed.

@azure-pipelines azure-pipelines / Agentbaker E2E

Test_Ubuntu2204_GPUNC

Failed
Raw output
    scenario_helpers_test.go:169: running scenario vhd: "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/2204gen2containerd/versions/1.1733181276.698", tags {Name:Test_Ubuntu2204_GPUNC ImageName:2204gen2containerd OS:ubuntu Arch:amd64 Airgap:false GPU:true WASM:false ServerTLSBootstrapping:false Scriptless:false KubeletCustomConfig:false}
    vmss.go:39: creating VMSS "2024-12-04-ogn7-ubuntu2204gpunc" in resource group "MC_abe2e-westus3_abe2e-kubenet-331fc_westus3"
    types.go:160: using "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/2204gen2containerd/versions/1.1733181276.698" for VHD
    scenario_helpers_test.go:125: vmss 2024-12-04-ogn7-ubuntu2204gpunc creation succeeded
    pollers.go:29: waiting for node 2024-12-04-ogn7-ubuntu2204gpunc to be ready
    pollers.go:57: node 2024-12-04-ogn7-ubuntu2204gpunc000000 is ready
    scenario_helpers_test.go:128: node 2024-12-04-ogn7-ubuntu2204gpunc is ready
    pod.go:65: creating pod "2024-12-04-ogn7-ubuntu2204gpunc000000-test-pod"
    pod.go:78: pod "2024-12-04-ogn7-ubuntu2204gpunc000000-test-pod" is ready
    validation.go:21: node health validation: test pod "2024-12-04-ogn7-ubuntu2204gpunc000000-test-pod" is running on node "2024-12-04-ogn7-ubuntu2204gpunc000000"
    validators.go:271: validating pod using nvidia GPU
    pod.go:63: truncated pod name to "2024-12-04-ogn7-ubuntu2204gpunc000000-enable-nvidia-device-plug"
    pod.go:65: creating pod "2024-12-04-ogn7-ubuntu2204gpunc000000-enable-nvidia-device-plug"
    pod.go:78: pod "2024-12-04-ogn7-ubuntu2204gpunc000000-enable-nvidia-device-plug" is ready
    validators.go:296: resource "nvidia.com/gpu" is available
    pod.go:65: creating pod "2024-12-04-ogn7-ubuntu2204gpunc000000-gpu-validation-pod"
    pollers.go:120: -- pod metadata --
    pollers.go:121:    Name: 2024-12-04-ogn7-ubuntu2204gpunc000000-gpu-validation-pod
                       Namespace: default
                       Node: 2024-12-04-6e08-ubuntu2204gpua10000000
                       Status: Pending
                       Start Time: <nil>
    pollers.go:123:    Reason: 
    pollers.go:124:    Message: 
    pollers.go:126: -- container(s) info --
    pollers.go:128:    Container: gpu-validation-container
                       Image: mcr.microsoft.com/azuredocs/samples-tf-mnist-demo:gpu
                       Ports: []
    pollers.go:130: -- pod events --
    pollers.go:133:    Reason: Scheduled, Message: Successfully assigned default/2024-12-04-ogn7-ubuntu2204gpunc000000-gpu-validation-pod to 2024-12-04-6e08-ubuntu2204gpua10000000, Count: 0, Last Timestamp: 0001-01-01 00:00:00 +0000 UTC
    pollers.go:77: time before timeout: 5m24.078736926s
        
    pollers.go:120: -- pod metadata --
    pollers.go:121:    Name: 
                       Namespace: 
                       Node: 
                       Status: 
                       Start Time: <nil>
    pollers.go:126: -- container(s) info --
    pollers.go:130: -- pod events --
    pollers.go:77: time before timeout: 24.079296106s
        
    pollers.go:88: pod 2024-12-04-ogn7-ubuntu2204gpunc000000-gpu-validation-pod not found yet. Err pods "2024-12-04-ogn7-ubuntu2204gpunc000000-gpu-validation-pod" not found
    pod.go:78: pod "2024-12-04-ogn7-ubuntu2204gpunc000000-gpu-validation-pod" is ready
    pod.go:79: 
        	Error Trace:	/mnt/vss/_work/1/s/e2e/pod.go:79
        	            				/mnt/vss/_work/1/s/e2e/validators.go:277
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:803
        	            				/mnt/vss/_work/1/s/e2e/scenario_helpers_test.go:177
        	            				/mnt/vss/_work/1/s/e2e/scenario_helpers_test.go:96
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:782
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:769
        	Erro

Check failure on line 1 in Test_Ubuntu2204_GPUA100

See this annotation in the file changed.

@azure-pipelines azure-pipelines / Agentbaker E2E

Test_Ubuntu2204_GPUA100

Failed
Raw output
    scenario_helpers_test.go:169: running scenario vhd: "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/2204gen2containerd/versions/1.1733181276.698", tags {Name:Test_Ubuntu2204_GPUA100 ImageName:2204gen2containerd OS:ubuntu Arch:amd64 Airgap:false GPU:true WASM:false ServerTLSBootstrapping:false Scriptless:false KubeletCustomConfig:false}
    vmss.go:39: creating VMSS "2024-12-04-gwfi-ubuntu2204gpua100" in resource group "MC_abe2e-westus3_abe2e-kubenet-331fc_westus3"
    types.go:160: using "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/2204gen2containerd/versions/1.1733181276.698" for VHD
    vmss.go:75: 
        	Error Trace:	/mnt/vss/_work/1/s/e2e/vmss.go:75
        	            				/mnt/vss/_work/1/s/e2e/scenario_helpers_test.go:122
        	            				/mnt/vss/_work/1/s/e2e/scenario_helpers_test.go:95
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:782
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:773
        	Error:      	Received unexpected error:
        	            	PUT https://management.azure.com/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-331fc_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/2024-12-04-gwfi-ubuntu2204gpua100
        	            	--------------------------------------------------------------------------------
        	            	RESPONSE 409: 409 Conflict
        	            	ERROR CODE: OperationNotAllowed
        	            	--------------------------------------------------------------------------------
        	            	{
        	            	  "error": {
        	            	    "code": "OperationNotAllowed",
        	            	    "message": "Operation could not be completed as it results in exceeding approved StandardNCADSA100v4Family Cores quota. Additional details - Deployment Model: Resource Manager, Location: WestUS3, Current Limit: 50, Current Usage: 48, Additional Required: 24, (Minimum) New Limit Required: 72. Setup Alerts when Quota reaches threshold. Learn more at https://aka.ms/quotamonitoringalerting . Submit a request for Quota increase at https://aka.ms/ProdportalCRP/#blade/Microsoft_Azure_Capacity/UsageAndQuota.ReactView/Parameters/%7B%22subscriptionId%22:%228ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8%22,%22command%22:%22openQuotaApprovalBlade%22,%22quotas%22:[%7B%22location%22:%22WestUS3%22,%22providerId%22:%22Microsoft.Compute%22,%22resourceName%22:%22StandardNCADSA100v4Family%22,%22quotaRequest%22:%7B%22properties%22:%7B%22limit%22:72,%22unit%22:%22Count%22,%22name%22:%7B%22value%22:%22StandardNCADSA100v4Family%22%7D%7D%7D%7D]%7D by specifying parameters listed in the ‘Details’ section for deployment to succeed. Please read more about quota limits at https://docs.microsoft.com/en-us/azure/azure-supportability/per-vm-quota-requests"
        	            	  }
        	            	}
        	            	--------------------------------------------------------------------------------
        	Test:       	Test_Ubuntu2204_GPUA100

Check failure on line 1 in Test_Ubuntu2204_GPUGridDriver

See this annotation in the file changed.

@azure-pipelines azure-pipelines / Agentbaker E2E

Test_Ubuntu2204_GPUGridDriver

Failed
Raw output
    scenario_helpers_test.go:169: running scenario vhd: "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/2204gen2containerd/versions/1.1733181276.698", tags {Name:Test_Ubuntu2204_GPUGridDriver ImageName:2204gen2containerd OS:ubuntu Arch:amd64 Airgap:false GPU:true WASM:false ServerTLSBootstrapping:false Scriptless:false KubeletCustomConfig:false}
    vmss.go:39: creating VMSS "2024-12-04-6l41-ubuntu2204gpugriddriver" in resource group "MC_abe2e-westus3_abe2e-kubenet-331fc_westus3"
    types.go:160: using "/subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/2204gen2containerd/versions/1.1733181276.698" for VHD
    scenario_helpers_test.go:125: vmss 2024-12-04-6l41-ubuntu2204gpugriddriver creation succeeded
    pollers.go:29: waiting for node 2024-12-04-6l41-ubuntu2204gpugriddriver to be ready
    pollers.go:57: node 2024-12-04-6l41-ubuntu2204gpugriddriver000000 is ready
    scenario_helpers_test.go:128: node 2024-12-04-6l41-ubuntu2204gpugriddriver is ready
    pod.go:65: creating pod "2024-12-04-6l41-ubuntu2204gpugriddriver000000-test-pod"
    pod.go:78: pod "2024-12-04-6l41-ubuntu2204gpugriddriver000000-test-pod" is ready
    validation.go:21: node health validation: test pod "2024-12-04-6l41-ubuntu2204gpugriddriver000000-test-pod" is running on node "2024-12-04-6l41-ubuntu2204gpugriddriver000000"
    validators.go:271: validating pod using nvidia GPU
    pod.go:63: truncated pod name to "2024-12-04-6l41-ubuntu2204gpugriddriver000000-enable-nvidia-dev"
    pod.go:65: creating pod "2024-12-04-6l41-ubuntu2204gpugriddriver000000-enable-nvidia-dev"
    pod.go:78: pod "2024-12-04-6l41-ubuntu2204gpugriddriver000000-enable-nvidia-dev" is ready
    validators.go:296: resource "nvidia.com/gpu" is available
    pod.go:63: truncated pod name to "2024-12-04-6l41-ubuntu2204gpugriddriver000000-gpu-validation-po"
    pod.go:65: creating pod "2024-12-04-6l41-ubuntu2204gpugriddriver000000-gpu-validation-po"
    pollers.go:120: -- pod metadata --
    pollers.go:121:    Name: 2024-12-04-6l41-ubuntu2204gpugriddriver000000-gpu-validation-po
                       Namespace: default
                       Node: 2024-12-04-846p-azurelinuxv2gpu000000
                       Status: Pending
                       Start Time: <nil>
    pollers.go:123:    Reason: 
    pollers.go:124:    Message: 
    pollers.go:126: -- container(s) info --
    pollers.go:128:    Container: gpu-validation-container
                       Image: mcr.microsoft.com/azuredocs/samples-tf-mnist-demo:gpu
                       Ports: []
    pollers.go:130: -- pod events --
    pollers.go:133:    Reason: Scheduled, Message: Successfully assigned default/2024-12-04-6l41-ubuntu2204gpugriddriver000000-gpu-validation-po to 2024-12-04-846p-azurelinuxv2gpu000000, Count: 0, Last Timestamp: 0001-01-01 00:00:00 +0000 UTC
    pollers.go:77: time before timeout: 6m23.457743142s
        
    pollers.go:120: -- pod metadata --
    pollers.go:121:    Name: 
                       Namespace: 
                       Node: 
                       Status: 
                       Start Time: <nil>
    pollers.go:126: -- container(s) info --
    pollers.go:130: -- pod events --
    pollers.go:77: time before timeout: 1m22.459936479s
        
    pollers.go:88: pod 2024-12-04-6l41-ubuntu2204gpugriddriver000000-gpu-validation-po not found yet. Err pods "2024-12-04-6l41-ubuntu2204gpugriddriver000000-gpu-validation-po" not found
    pod.go:78: pod "2024-12-04-6l41-ubuntu2204gpugriddriver000000-gpu-validation-po" is ready
    pod.go:79: 
        	Error Trace:	/mnt/vss/_work/1/s/e2e/pod.go:79
        	            				/mnt/vss/_work/1/s/e2e/validators.go:277
        	            				/mnt/vss/_work/1/s/e2e/scenario_test.go:831
        	            				/mnt/vss/_work/1/s/e2e/scenario_helpers_test.go:177