Skip to content

Sitemap script + alert#100

Merged
nicoalba merged 7 commits into
mainfrom
sitemap-flow
May 22, 2026
Merged

Sitemap script + alert#100
nicoalba merged 7 commits into
mainfrom
sitemap-flow

Conversation

@nicoalba
Copy link
Copy Markdown
Contributor

@nicoalba nicoalba commented May 21, 2026

  • Compares the newly built sitemap.xml against a cached baseline from the previous successful build; on first run (no cache), seeds the baseline from the live prod sitemap
  • For each path that disappears from the new build, checks whether plugin-client-redirects wrote a redirect stub at that path — if no stub exists, the build fails
  • Scans all redirect stubs in the build output and validates the redirect graph:
    • Stale (fails): stub points to a page that no longer exists in the new build
    • Loop (fails): two or more stubs redirect to each other infinitely
    • Chained (warning): redirect takes more than one hop to reach the final page
    • Shadowed (warning): a stub exists for a path that is also a real live page, so the stub never fires
  • When unresolved paths are found, prints copy-paste-ready redirect snippets grouped by product, with the target repo and function named (openzitiRedirects(), etc.) and a fuzzy guess at the destination when a matching page segment is found in the new sitemap
  • Ignore list for expected churn (blog, versioned docs, tags, categories) lives in sitemap-ignore.json alongside the script instead of being hardcoded
  • yarn check-drift / yarn unified:check-drift aliases let you run the gate locally after a build without memorizing the argument triple
  • On failure, sends a Mattermost alert to doc-alerts listing the unresolved paths and a link to the build log; does not double-alert on nightly runs where the generic failure notifier would also fire
  • Caches the baseline sitemap between runs via actions/cache; also archives input and output baselines as 90-day artifacts so any run's sitemap state can be retrieved with gh run download

@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
nf-theme-sandbox Ignored Ignored Preview May 22, 2026 10:29pm
unified-doc-preview Ignored Ignored Preview May 22, 2026 10:29pm

Request Review

Copy link
Copy Markdown
Member

@dovholuknf dovholuknf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong pr.. hold on... :)

nicoalba and others added 2 commits May 21, 2026 20:06
- Rewrite check-sitemap-drift.mjs: compare new build against cached
  baseline (not live prod), check redirect stubs for each removed path,
  exit 1 on unresolved removals so the publish is aborted
- Update publish-unified-doc.sh: pass baseline path and build dir,
  remove || true so drift failures actually block the script
- Update publish.yml: add actions/cache restore/save for baseline,
  drop the wrong notify-sitemap-drift job, add inline drift alert
  steps (if: failure()) that send to Mattermost doc-alerts when
  unresolved removals are found

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Expose has_drift as a job output and skip notify-mattermost when
drift already sent its own alert.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nicoalba nicoalba changed the title feat: add nightly sitemap drift detection with Mattermost alert feat: pre-publish sitemap drift gate with redirect validation May 22, 2026
nicoalba and others added 2 commits May 22, 2026 18:04
- Extract IGNORE_PREFIXES into sitemap-ignore.json (colocated with
  script, passed as optional 4th arg) so it's easy to find and edit
- Add yarn check-drift / yarn unified:check-drift aliases so the gate
  can be run locally without memorizing the arg triple; publish script
  uses the alias instead of invoking node directly
- Emit copy-paste redirect snippets for unresolved paths: grouped by
  product with the target repo/function named, fuzzy-guess the `to`
  by matching the last path segment against the new sitemap
- Scan all redirect stubs (plugin-client-redirects index.html files),
  build a redirect graph, and detect: stale targets (final target not
  in sitemap → exit 1), loops (→ exit 1), chained >1 hop (warning),
  shadowed stubs where the path is also a real page (warning)
- Archive input and output sitemap baselines as artifacts (90-day
  retention) for inspection and history via gh run download

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nicoalba nicoalba changed the title feat: pre-publish sitemap drift gate with redirect validation Sitemap script + alert May 22, 2026
Copy link
Copy Markdown
Member

@dovholuknf dovholuknf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems right. assuming it works locally should be gtg

nicoalba and others added 2 commits May 22, 2026 22:05
Full pages rendered by Docusaurus contain __docusaurus in their HTML —
only bare plugin-client-redirects stubs should be scanned for the
redirect graph. Also add /docs/llms.txt to sitemap-ignore.json since
it is injected into prod by publish-unified-doc.sh post-build and
never appears in a local build's sitemap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nicoalba nicoalba merged commit d649431 into main May 22, 2026
6 checks passed
@nicoalba nicoalba deleted the sitemap-flow branch May 22, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants