Skip to content

feat(linux): reduce prefetch optimization runtime within VHD builds and add retry logic to handle AIB rate limits#8802

Merged
cameronmeissner merged 7 commits into
mainfrom
cameissner/optimize-prefetch-build-runtime
Jul 2, 2026
Merged

feat(linux): reduce prefetch optimization runtime within VHD builds and add retry logic to handle AIB rate limits#8802
cameronmeissner merged 7 commits into
mainfrom
cameissner/optimize-prefetch-build-runtime

Conversation

@cameronmeissner

@cameronmeissner cameronmeissner commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

reduces prefetch optimization runtime within VHD builds by 40-60% from initial testing by manually copying AIB's output VHD to our destination storage container manually via azcopy, rather than relying on AIB's mechanism to distribute the blob automatically, which seems to work much slower

  1. "Run Prefetch Optimization and Convert to VHD blob" task step

Across every per-SKU execution (main n=257, feature n=59):

┌────────────┬──────────┬──────────┬──────────────┐
│ Percentile │ main │ feature │ Reduction │
├────────────┼──────────┼──────────┼──────────────┤
│ Median │ 66.6 min │ 26.7 min │ −60% (2.49×) │
├────────────┼──────────┼──────────┼──────────────┤
│ P90 │ 82.0 min │ 33.4 min │ −59% │
├────────────┼──────────┼──────────┼──────────────┤
│ P95 │ 86.4 min │ 35.9 min │ −58% │
├────────────┼──────────┼──────────┼──────────────┤
│ P99 │ 99.9 min │ 42.2 min │ −58% │
├────────────┼──────────┼──────────┼──────────────┤
│ mean │ 67.6 min │ 27.8 min │ −59% │
└────────────┴──────────┴──────────┴──────────────┘

Per-SKU speedup is consistent across all 26 SKUs (1.74×–3.03×, ~2.4× avg). Crucially, the two full feature builds (all 30 SKUs, same parallel contention as main) show the same ~2.5× gain — so it's a real optimization, not a small-build artifact.

  1. Total VHD build runtime

┌────────────┬─────────────┬───────────────┐
│ Percentile │ main (n=10) │ feature (n=5) │
├────────────┼─────────────┼───────────────┤
│ Median │ 130.3 min │ 82.6 min │
├────────────┼─────────────┼───────────────┤
│ P95 │ 250.1 min │ 94.1 min │
├────────────┼─────────────┼───────────────┤
│ max │ 290.7 min │ 95.2 min │
├────────────┼─────────────┼───────────────┤
│ mean │ 153.3 min │ 81.2 min │
└────────────┴─────────────┴───────────────┘

Caveat: 3 of 5 feature runs were partial (5/5/1 SKUs). Fair full-vs-full: feature 89.5 & 95.2 min (median ~92) vs main median ~130 min → ~30–40% lower, and main's long tail (200/291 min) disappears.

Which issue(s) this PR fixes:

Fixes #

Copilot AI review requested due to automatic review settings June 30, 2026 22:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the Linux VHD prefetch optimization flow in vhdbuilder/prefetch/ to speed up VHD builds by having Azure Image Builder distribute the optimized output as a managed image, then converting that managed image into a VHD blob in the target storage account via a single azcopy operation.

Changes:

  • Update the Image Builder template to distribute to a ManagedImage instead of a VHD blob.
  • Add a managed-image → managed-disk → VHD conversion step in optimize.sh after the Image Builder run.
  • Adjust idempotency logic to treat the target VHD as “complete” only after a metadata marker is written.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
vhdbuilder/prefetch/templates/optimize.json Switch Image Builder distribution output from VHD blob to managed image output.
vhdbuilder/prefetch/scripts/optimize.sh Convert the distributed managed image into a VHD blob and mark completion via blob metadata.

Comment thread vhdbuilder/prefetch/scripts/optimize.sh Outdated
Comment thread vhdbuilder/prefetch/scripts/optimize.sh Outdated
Comment thread vhdbuilder/prefetch/scripts/optimize.sh Outdated
Comment thread vhdbuilder/prefetch/scripts/optimize.sh Outdated
Copilot AI review requested due to automatic review settings July 1, 2026 15:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread vhdbuilder/prefetch/scripts/optimize.sh
Comment thread vhdbuilder/prefetch/scripts/optimize.sh
Comment thread vhdbuilder/prefetch/scripts/optimize.sh
Comment thread vhdbuilder/prefetch/scripts/optimize.sh
Copilot AI review requested due to automatic review settings July 1, 2026 21:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread vhdbuilder/prefetch/scripts/optimize.sh
Copilot AI review requested due to automatic review settings July 1, 2026 21:26
@cameronmeissner cameronmeissner changed the title feat(linux): reduce prefetch optimization runtime within VHD builds feat(linux): reduce prefetch optimization runtime within VHD builds and add retry logic to handle AIB rate limits Jul 1, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread vhdbuilder/prefetch/scripts/optimize.sh
@cameronmeissner cameronmeissner marked this pull request as ready for review July 2, 2026 18:14
Copilot AI review requested due to automatic review settings July 2, 2026 18:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread vhdbuilder/prefetch/scripts/optimize.sh
@aks-node-assistant

Copy link
Copy Markdown
Contributor

Failed gate

Run: https://msazure.visualstudio.com/CloudNativeCompute/_build/results?buildId=170737959

Failed job/stage/task: $(System.Collections.Hashtable.job) (logId 481).

Detective summary

Known CIS-CAT scanner failure in vhd-scanning: CIS-CAT Pro Assessor v4.57.1 reports Assessment 1 Exit Value: 122, assessor exits 1, wrapper retries twice, and the task exits 2.

Likely cause / signature

Likely known VHD scan/CIS-CAT gate tooling failure, not this PR. Signature: AB-GATE-LINUX-VHD-SCAN-CISCAT-EXIT122. Confidence: High.

Strongest alternative: The prefetch/AIB optimization changes affected image generation; less likely because two unrelated component-update PRs in the same cycle failed with the identical scanner signature across the same VHD legs.

Recommended action

No PR-specific remediation recommended unless future evidence shows a distinct CIS rule failure; continue repair item #38671557.

Evidence

  • Timeline/build status: build stage failed in VHD Test, Scan, and Cleanup tasks with exit code 2
  • Log: �hd-scanning.sh failed twice; CIS-CAT Pro v4.57.1 reported Assessment 1 Exit Value: 122
  • Wiki signature: AB-GATE-LINUX-VHD-SCAN-CISCAT-EXIT122
  • Repair item: #38671557

@cameronmeissner cameronmeissner merged commit dc132b0 into main Jul 2, 2026
31 of 35 checks passed
@cameronmeissner cameronmeissner deleted the cameissner/optimize-prefetch-build-runtime branch July 2, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants