feat(linux): add AMD MI300X ROCm bootstrap#8824
Build #20260702.42 had test failures
Details
- Failed: 39 (17.33%)
- Passed: 186 (82.67%)
- Other: 0 (0.00%)
- Total: 225
Annotations
Check failure on line 4084 in Build log
azure-pipelines / Agentbaker GPU E2E
Build log #L4084
Script failed with exit code: 1
Check failure on line 1 in Test_Ubuntu2204_GPUGridDriver/scriptless_nbc
azure-pipelines / Agentbaker GPU E2E
Test_Ubuntu2204_GPUGridDriver/scriptless_nbc
Failed
Raw output
=== RUN Test_Ubuntu2204_GPUGridDriver/scriptless_nbc
=== PAUSE Test_Ubuntu2204_GPUGridDriver/scriptless_nbc
=== CONT Test_Ubuntu2204_GPUGridDriver/scriptless_nbc
test_helpers.go:418: [10.347s] TAGS {Name:Test_Ubuntu2204_GPUGridDriver/scriptless_nbc ImageName:2204gen2containerd OS:ubuntu Arch:amd64 NetworkIsolated:false NonAnonymousACR:false GPU:true WASM:false BootstrapTokenFallback:false KubeletCustomConfig:false Scriptless:false VHDCaching:false MockAzureChinaCloud:false VMSeriesCoverageTest:false}
test_helpers.go:229: [10.352s] → running scenario...
test_helpers.go:246: [10.352s] using cluster abe2e-kubenet-v5-150ee in rg=abe2e-westus3 sub=8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8
test_helpers.go:247: [10.352s] portal: https://portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/abe2e-westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v5-150ee/overview
test_helpers.go:279: [10.375s] → preparing AKS node...
vmss.go:531: [10.375s] → creating VMSS lmfk-2026-07-03-ubuntu2204gpugriddriverscriptlessnbc...
vmss.go:435: [11.260s] VMSS portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/lmfk-2026-07-03-ubuntu2204gpugriddriverscriptlessnbc/overview
vmss.go:441: [11.260s] Managed cluster portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v5-150ee/overview
vmss.go:564: [32.439s] VM will be automatically deleted after the test finishes, to preserve it for debugging purposes set KEEP_VMSS=true or pause the test with a breakpoint before the test finishes or failed
vmss.go:568: [32.440s] SSH Instructions: (may take a few minutes for the VM to be ready for SSH)
========================
az network bastion ssh --target-resource-id "/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/lmfk-2026-07-03-ubuntu2204gpugriddriverscriptlessnbc/virtualMachines/0" --name "abe2e-shared-bastion" --resource-group abe2e-westus3 --auth-type ssh-key --username azureuser --ssh-key /tmp/private-key-2766443897
bastionssh.go:304: [344.472s] Attempt 1/5 establishing SSH over bastion to 10.220.112.51
vmss.go:618: [345.472s] VM reached running state
vmss.go:588: [345.473s] ✓ creating VMSS lmfk-2026-07-03-ubuntu2204gpugriddriverscriptlessnbc done (335.1s)
kube.go:160: [345.473s] → waiting for node lmfk-2026-07-03-ubuntu2204gpugriddriverscriptlessnbc to be ready...
kube.go:182: [345.595s] node lmfk-2026-07-03-ubuntu2204gpugriddriverscriptlessnbc000000 is ready. Taints: [{"key":"node.kubernetes.io/network-unavailable","effect":"NoSchedule","timeAdded":"2026-07-03T00:06:04Z"}] Conditions: [{"type":"NetworkUnavailable","status":"True","lastHeartbeatTime":"2026-07-03T00:06:04Z","lastTransitionTime":"2026-07-03T00:06:04Z","reason":"NodeInitialization","message":"Waiting for cloud routes"},{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2026-07-03T00:06:28Z","lastTransitionTime":"2026-07-03T00:05:57Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2026-07-03T00:06:28Z","lastTransitionTime":"2026-07-03T00:05:57Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","s
... [The stack trace has been truncated as it exceeded the maximum allowed size. Please refer to the complete log available in the Test Run attachments for full details.]
Check failure on line 1 in Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset/default
azure-pipelines / Agentbaker GPU E2E
Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset/default
Failed
Raw output
=== RUN Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset/default
=== PAUSE Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset/default
=== CONT Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset/default
test_helpers.go:418: [10.344s] TAGS {Name:Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset/default ImageName:2204gen2containerd OS:ubuntu Arch:amd64 NetworkIsolated:false NonAnonymousACR:false GPU:true WASM:false BootstrapTokenFallback:false KubeletCustomConfig:false Scriptless:false VHDCaching:false MockAzureChinaCloud:false VMSeriesCoverageTest:false}
test_helpers.go:229: [10.347s] → running scenario...
test_helpers.go:246: [10.347s] using cluster abe2e-kubenet-v5-150ee in rg=abe2e-westus3 sub=8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8
test_helpers.go:247: [10.347s] portal: https://portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/abe2e-westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v5-150ee/overview
test_helpers.go:279: [10.382s] → preparing AKS node...
vmss.go:531: [10.382s] → creating VMSS d7ff-2026-07-03-ubuntu2204nvidiadeviceplugindaemonsetdefa...
vmss.go:435: [11.483s] VMSS portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/d7ff-2026-07-03-ubuntu2204nvidiadeviceplugindaemonsetdefa/overview
vmss.go:441: [11.483s] Managed cluster portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v5-150ee/overview
vmss.go:564: [30.367s] VM will be automatically deleted after the test finishes, to preserve it for debugging purposes set KEEP_VMSS=true or pause the test with a breakpoint before the test finishes or failed
vmss.go:568: [30.367s] SSH Instructions: (may take a few minutes for the VM to be ready for SSH)
========================
az network bastion ssh --target-resource-id "/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/d7ff-2026-07-03-ubuntu2204nvidiadeviceplugindaemonsetdefa/virtualMachines/0" --name "abe2e-shared-bastion" --resource-group abe2e-westus3 --auth-type ssh-key --username azureuser --ssh-key /tmp/private-key-2766443897
bastionssh.go:304: [402.878s] Attempt 1/5 establishing SSH over bastion to 10.220.112.44
vmss.go:618: [404.757s] VM reached running state
vmss.go:588: [404.757s] ✓ creating VMSS d7ff-2026-07-03-ubuntu2204nvidiadeviceplugindaemonsetdefa done (394.4s)
kube.go:160: [404.757s] → waiting for node d7ff-2026-07-03-ubuntu2204nvidiadeviceplugindaemonsetdefa to be ready...
kube.go:182: [404.886s] node d7ff-2026-07-03-ubuntu2204nvidiadeviceplugindaemonsetdefa000000 is ready. Taints: [{"key":"node.kubernetes.io/network-unavailable","effect":"NoSchedule","timeAdded":"2026-07-03T00:07:12Z"}] Conditions: [{"type":"NetworkUnavailable","status":"True","lastHeartbeatTime":"2026-07-03T00:07:11Z","lastTransitionTime":"2026-07-03T00:07:11Z","reason":"NodeInitialization","message":"Waiting for cloud routes"},{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2026-07-03T00:07:37Z","lastTransitionTime":"2026-07-03T00:07:06Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2026-07-03T00:07:37Z","lastTransitionTime":"2026-07-03T00:07:06Z","reason":"KubeletHasNoDiskPressure","mes
... [The stack trace has been truncated as it exceeded the maximum allowed size. Please refer to the complete log available in the Test Run attachments for full details.]
Check failure on line 1 in Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset
azure-pipelines / Agentbaker GPU E2E
Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset
Failed
Raw output
=== RUN Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset
=== PAUSE Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset
=== CONT Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset
--- FAIL: Test_Ubuntu2204_NvidiaDevicePlugin_Daemonset (0.00s)
Check failure on line 1 in Test_Ubuntu2204_GPUA10/default
azure-pipelines / Agentbaker GPU E2E
Test_Ubuntu2204_GPUA10/default
Failed
Raw output
=== RUN Test_Ubuntu2204_GPUA10/default
=== PAUSE Test_Ubuntu2204_GPUA10/default
=== CONT Test_Ubuntu2204_GPUA10/default
test_helpers.go:418: [10.346s] TAGS {Name:Test_Ubuntu2204_GPUA10/default ImageName:2204gen2containerd OS:ubuntu Arch:amd64 NetworkIsolated:false NonAnonymousACR:false GPU:true WASM:false BootstrapTokenFallback:false KubeletCustomConfig:false Scriptless:false VHDCaching:false MockAzureChinaCloud:false VMSeriesCoverageTest:false}
test_helpers.go:229: [10.346s] → running scenario...
test_helpers.go:246: [10.346s] using cluster abe2e-kubenet-v5-150ee in rg=abe2e-westus3 sub=8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8
test_helpers.go:247: [10.346s] portal: https://portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/abe2e-westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v5-150ee/overview
test_helpers.go:279: [10.381s] → preparing AKS node...
vmss.go:531: [10.382s] → creating VMSS ojep-2026-07-03-ubuntu2204gpua10default...
vmss.go:435: [11.377s] VMSS portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/ojep-2026-07-03-ubuntu2204gpua10default/overview
vmss.go:441: [11.389s] Managed cluster portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v5-150ee/overview
vmss.go:564: [30.460s] VM will be automatically deleted after the test finishes, to preserve it for debugging purposes set KEEP_VMSS=true or pause the test with a breakpoint before the test finishes or failed
vmss.go:568: [30.460s] SSH Instructions: (may take a few minutes for the VM to be ready for SSH)
========================
az network bastion ssh --target-resource-id "/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v5-150ee_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/ojep-2026-07-03-ubuntu2204gpua10default/virtualMachines/0" --name "abe2e-shared-bastion" --resource-group abe2e-westus3 --auth-type ssh-key --username azureuser --ssh-key /tmp/private-key-2766443897
bastionssh.go:304: [403.286s] Attempt 1/5 establishing SSH over bastion to 10.220.112.50
vmss.go:618: [405.282s] VM reached running state
vmss.go:588: [405.282s] ✓ creating VMSS ojep-2026-07-03-ubuntu2204gpua10default done (394.9s)
kube.go:160: [405.283s] → waiting for node ojep-2026-07-03-ubuntu2204gpua10default to be ready...
kube.go:182: [405.403s] node ojep-2026-07-03-ubuntu2204gpua10default000000 is ready. Taints: [{"key":"node.kubernetes.io/network-unavailable","effect":"NoSchedule","timeAdded":"2026-07-03T00:07:26Z"}] Conditions: [{"type":"NetworkUnavailable","status":"True","lastHeartbeatTime":"2026-07-03T00:07:26Z","lastTransitionTime":"2026-07-03T00:07:26Z","reason":"NodeInitialization","message":"Waiting for cloud routes"},{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2026-07-03T00:07:19Z","lastTransitionTime":"2026-07-03T00:07:19Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2026-07-03T00:07:19Z","lastTransitionTime":"2026-07-03T00:07:19Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2026-07-03T00:07:19Z","lastTransitionTime":"2026-07-03T00:07:19Z","reason":"KubeletHasSufficientPI
... [The stack trace has been truncated as it exceeded the maximum allowed size. Please refer to the complete log available in the Test Run attachments for full details.]