fix: do not review - fix Ubuntu2204 HTTPSProxy PrivateDNS CSE exit50 kubelet#8809
Draft
SriHarsha001 wants to merge 2 commits into
Draft
fix: do not review - fix Ubuntu2204 HTTPSProxy PrivateDNS CSE exit50 kubelet#8809SriHarsha001 wants to merge 2 commits into
SriHarsha001 wants to merge 2 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the e2e VMSS provisioning harness to mitigate a known transient Linux CSE failure where the outbound connectivity preflight exits with ERR_OUTBOUND_CONN_FAIL (exit code 50). The approach is to detect that specific failure mode in the Azure VMExtensionProvisioningError payload and recreate the VMSS a bounded number of times to reduce PR-gate flakes.
Changes:
- Add a bounded recreate loop in
ConfigureAndCreateVMSSthat retries VMSS creation when the failure is classified as transient exit-50. - Introduce
cseExitCodeOutboundConnFail = "50"and a helper classifierisTransientOutboundCSEFailure(err). - Add unit tests to pin the exit code constant and validate the classifier behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
e2e/vmss.go |
Adds bounded recreate-on-exit-50 logic, plus helper functions for classification and synchronous VMSS deletion. |
e2e/vmss_test.go |
Adds tests for the exit code constant and the transient-failure classifier. |
e2e/const.go |
Introduces a named constant for the CSE outbound connectivity failure exit code (50). |
Comment on lines
83
to
+101
| func ConfigureAndCreateVMSS(ctx context.Context, s *Scenario) (*ScenarioVM, error) { | ||
| vm, err := CreateVMSSWithRetry(ctx, s) | ||
| var vm *ScenarioVM | ||
| var err error | ||
| for attempt := 0; ; attempt++ { | ||
| vm, err = CreateVMSSWithRetry(ctx, s) | ||
| if err == nil { | ||
| break | ||
| } | ||
| // Known transient e2e-infra flake: the CSE outbound connectivity preflight check | ||
| // (curl mcr.microsoft.com, optionally via the e2e proxy) intermittently fails all | ||
| // retries and exits ERR_OUTBOUND_CONN_FAIL (50) before kubelet starts. Recreate the | ||
| // node a bounded number of times to reduce PR-gate noise without masking real | ||
| // regressions, which fail consistently and survive the retry budget. | ||
| if attempt >= maxOutboundCSERetries || s.IsWindows() || config.Config.KeepVMSS || !isTransientOutboundCSEFailure(err) { | ||
| break | ||
| } | ||
| toolkit.Logf(ctx, "CSE failed with ERR_OUTBOUND_CONN_FAIL (exit %s) on VMSS %q: known transient e2e outbound flake, recreating node (attempt %d/%d)", cseExitCodeOutboundConnFail, s.Runtime.VMSSName, attempt+1, maxOutboundCSERetries) | ||
| deleteVMSSAndWait(ctx, s) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #