ci(e2e): gate staging e2e on critical staging-instance config drift by jacekradko · Pull Request #8757 · clerk/javascript

jacekradko · 2026-06-05T03:00:28Z

Follow-up to #8756. The validate-staging-instances script already compares prod vs staging /v1/environment and prints a diff, but it always exited 0, so a drifted staging mirror (like the missing WhatsApp channel that makes whatsapp-phone-code time out) blocked nothing and stayed invisible until tests failed 200-deep.

This gives the script teeth without flipping any behavior yet. It gains a tight CRITICAL_PATHS allowlist (attribute enabled toggles, phone_number.channels, auth factors, social enable/disable, password policy) plus an ACCEPTED_DRIFT escape hatch, so a known and tracked gap doesn't block while new drift does. In strict mode it exits non-zero on a blocking mismatch; fetch failures and cosmetic drift never fail the build.

Strictness is driven by the STAGING_VALIDATE_STRICT repo variable and defaults to report-only, and integration-tests now depends on validate-instances. So nothing changes until someone sets the variable: today it just logs the blocking drift and the gate it would apply. The piece worth a look is the CRITICAL_PATHS set, that is the policy of what is worth blocking a run over.

Before enabling strict, run the validator against current staging to confirm the only blocking drift is expected, and add ACCEPTED_DRIFT entries for anything intentionally tolerated. Stacked on #8756.

Update: the branch is rebased onto main (dropping the stale pre-squash copy of #8756 and keeping main's TURBO_FORCE and report-path fixes), and captcha_enabled is now in CRITICAL_PATHS. An enabled captcha blocks every in-browser sign-up in headless CI, which is what kept the staging generic leg red for a week (legal-consent vs Turnstile). The captcha ignore-list removal is included here too so the gate works standalone; it overlaps with #8832 by design and merges cleanly in either order, with a pipeline test pinning that critical paths cannot be swallowed by the ignore filter. Also, the report job now notifies Slack when the strict gate itself fails, since a gate failure skips the test legs rather than failing them and would otherwise be silent. Still report-only until the repo var is set; bring instances to parity first using #8832's report.

Summary by CodeRabbit

Release Notes

Chores
- Enhanced E2E staging validation with optional strict mode for improved deployment gating
- Updated staging workflow with improved job sequencing and longer test artifact retention
- Added critical configuration drift detection for staging environments
Tests
- Expanded test coverage for staging validation scenarios

changeset-bot · 2026-06-05T03:00:33Z

🦋 Changeset detected

Latest commit: 9353469

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 0 packages

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-06-05T03:00:34Z

📝 Walkthrough

Walkthrough

This PR adds critical-path validation with strict gating to staging instance configuration checks, wires this validation into the E2E workflow to skip tests on configuration drift, and provides comprehensive test coverage for the new validation logic.

Changes

Staging E2E Validation Strict Gating

Layer / File(s)	Summary
Critical path policy and mismatch classification `scripts/validate-staging-instances.mjs`	Defines which configuration paths are critical (factor changes, provider toggles, password settings, captcha gating) and provides `isCriticalPath()`, `isAcceptedDrift()`, and `classifyMismatches()` helpers to distinguish blocking mismatches from informational ones.
Strict gating enforcement in validation script `scripts/validate-staging-instances.mjs`	`main()` now accepts a `strict` parameter (from `STAGING_VALIDATE_STRICT` env var or `--strict` CLI flag), accumulates blocking critical mismatches per instance, and exits with code 1 in strict mode when blocking mismatches exist; exports the new classification helpers.
Validation script test coverage `scripts/validate-staging-instances.test.mjs`	Comprehensive tests assert critical-path detection for authentication factors, social providers, password settings, and captcha toggles; verify blocking vs informational classification with instance-scoped accepted drift and regex allowlists; and confirm strict-mode gating behavior (exit code 1 on critical drift, report-only in non-strict mode).
Workflow orchestration and validation gating `.github/workflows/e2e-staging.yml`	Adds `STAGING_VALIDATE_STRICT` env var sourced from repo variables, makes `integration-tests` conditional on `validate-instances` succeeding or skipped, adds `validate-instances` to the final `report` job dependencies, and extends Slack failure notifications to trigger on validation failures.
Changeset documentation `.changeset/staging-e2e-validate-gate.md`	Changeset file declaring the staging E2E validation gating feature.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

clerk/javascript#8766: Both PRs update the staging E2E workflow's Playwright JSON report artifact handling, including integration/playwright-report/results.json artifact uploads and integration-test execution/reporting steps.

Suggested Reviewers

tmilewski

🐰 A rabbit hops through staging gates so fine,
Gating critical paths with each config line,
When drift appears, the strict mode takes its stand,
With blocking checks across the e2e land,
Tests prove the logic works just as planned! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding a gate for staging e2e tests based on critical configuration drift detection between production and staging instances.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vercel · 2026-06-05T03:00:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
clerk-js-sandbox	Ready	Preview, Comment	Jun 11, 2026 5:21pm
swingset	Ready	Preview, Comment	Jun 11, 2026 5:21pm

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/validate-staging-instances.mjs (1)

24-32: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Critical captcha drift is filtered out before strict gating.

user_settings.sign_up.captcha_enabled is marked critical (Line 61), but isIgnored still drops *.captcha_enabled (Line 28) before classification (Line 452). That makes this critical path non-blocking in practice.

Suggested fix

 const IGNORED_PATHS = [
   /\.id$/,
   /^auth_config\.id$/,
   /\.logo_url$/,
-  /\.captcha_enabled$/,
-  /\.captcha_widget_type$/,
   /\.enforce_hibp_on_sign_in$/,
   /\.disable_hibp$/,
 ];

Also applies to: 47-62, 452-457

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/validate-staging-instances.mjs` around lines 24 - 32, The
IGNORED_PATHS array contains a /\.captcha_enabled$/ pattern which causes
isIgnored to drop captcha flags before they can be classified as critical
(specifically user_settings.sign_up.captcha_enabled); remove or narrow that
pattern in IGNORED_PATHS (or change the order so classification runs before
isIgnored) so that user_settings.sign_up.captcha_enabled is evaluated by the
existing critical-path logic; locate IGNORED_PATHS and the isIgnored call in
scripts/validate-staging-instances.mjs and ensure *.captcha_enabled is not
globally filtered out prior to the criticality check.

🧹 Nitpick comments (1)

scripts/validate-staging-instances.test.mjs (1)

647-717: ⚡ Quick win

Add a strict-mode main() regression test for captcha_enabled drift.

Current tests validate captcha at classifier level, but not through the full main() pipeline. Add one case where user_settings.sign_up.captcha_enabled differs and main({ strict: true }) must exit with 1.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/validate-staging-instances.test.mjs` around lines 647 - 717, Add a
new regression test in scripts/validate-staging-instances.test.mjs that mirrors
the existing strict-mode cases but uses a drift in
user_settings.sign_up.captcha_enabled: call setPair(), mockEnvPair() with one
env having user_settings: {...emptyUserSettings(), sign_up: { captcha_enabled:
true }} and the other having sign_up: { captcha_enabled: false }, then await
expect(main({ strict: true })).rejects.toThrow('process.exit(1)') and assert
exitCode === 1 and consoleLogs contains the blocking mismatch message; follow
the pattern used in the other tests (e.g., the "exits non-zero in strict mode
when a critical config path drifts" test) to place and name the new it(...)
block.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@scripts/validate-staging-instances.mjs`:
- Around line 24-32: The IGNORED_PATHS array contains a /\.captcha_enabled$/
pattern which causes isIgnored to drop captcha flags before they can be
classified as critical (specifically user_settings.sign_up.captcha_enabled);
remove or narrow that pattern in IGNORED_PATHS (or change the order so
classification runs before isIgnored) so that
user_settings.sign_up.captcha_enabled is evaluated by the existing critical-path
logic; locate IGNORED_PATHS and the isIgnored call in
scripts/validate-staging-instances.mjs and ensure *.captcha_enabled is not
globally filtered out prior to the criticality check.

---

Nitpick comments:
In `@scripts/validate-staging-instances.test.mjs`:
- Around line 647-717: Add a new regression test in
scripts/validate-staging-instances.test.mjs that mirrors the existing
strict-mode cases but uses a drift in user_settings.sign_up.captcha_enabled:
call setPair(), mockEnvPair() with one env having user_settings:
{...emptyUserSettings(), sign_up: { captcha_enabled: true }} and the other
having sign_up: { captcha_enabled: false }, then await expect(main({ strict:
true })).rejects.toThrow('process.exit(1)') and assert exitCode === 1 and
consoleLogs contains the blocking mismatch message; follow the pattern used in
the other tests (e.g., the "exits non-zero in strict mode when a critical config
path drifts" test) to place and name the new it(...) block.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 91ce88f1-9bd4-4352-b093-5a8ffded5427

📥 Commits

Reviewing files that changed from the base of the PR and between 5bf43a2 and bf10aa7.

📒 Files selected for processing (8)

.changeset/staging-e2e-resilience-p0.md
.changeset/staging-e2e-validate-gate.md
.github/workflows/e2e-staging.yml
integration/playwright.config.ts
integration/tests/custom-pages.test.ts
integration/tests/whatsapp-phone-code.test.ts
scripts/validate-staging-instances.mjs
scripts/validate-staging-instances.test.mjs

validate-staging-instances.mjs already diffs prod vs staging /v1/environment but every exit path returned 0, so detected drift blocked nothing and the job was not a dependency of the test matrix. A drifted staging mirror (e.g. a missing phone_number WhatsApp channel) therefore surfaced only as opaque test timeouts 200 tests deep. Add a tight CRITICAL_PATHS allowlist (attribute enabled toggles, phone_number.channels, auth factors/strategies, social enable/disable, password settings) and an ACCEPTED_DRIFT escape hatch so known gaps don't block while new drift does. In strict mode the script exits non-zero on a blocking mismatch; fetch failures and cosmetic drift never fail the build. Wire integration-tests to need validate-instances, and drive strictness from the STAGING_VALIDATE_STRICT repo variable (default report-only). So this is a no-op until the team opts in: it logs blocking drift and the proposed gate without failing anything. Flip the variable to make it enforce.

…e failures

pkg-pr-new · 2026-06-11T17:22:31Z

Open in StackBlitz

@clerk/astro

npm i https://pkg.pr.new/@clerk/astro@8757

@clerk/backend

npm i https://pkg.pr.new/@clerk/backend@8757

@clerk/chrome-extension

npm i https://pkg.pr.new/@clerk/chrome-extension@8757

@clerk/clerk-js

npm i https://pkg.pr.new/@clerk/clerk-js@8757

@clerk/expo

npm i https://pkg.pr.new/@clerk/expo@8757

@clerk/expo-passkeys

npm i https://pkg.pr.new/@clerk/expo-passkeys@8757

@clerk/express

npm i https://pkg.pr.new/@clerk/express@8757

@clerk/fastify

npm i https://pkg.pr.new/@clerk/fastify@8757

@clerk/hono

npm i https://pkg.pr.new/@clerk/hono@8757

@clerk/localizations

npm i https://pkg.pr.new/@clerk/localizations@8757

@clerk/nextjs

npm i https://pkg.pr.new/@clerk/nextjs@8757

@clerk/nuxt

npm i https://pkg.pr.new/@clerk/nuxt@8757

@clerk/react

npm i https://pkg.pr.new/@clerk/react@8757

@clerk/react-router

npm i https://pkg.pr.new/@clerk/react-router@8757

@clerk/shared

npm i https://pkg.pr.new/@clerk/shared@8757

@clerk/tanstack-react-start

npm i https://pkg.pr.new/@clerk/tanstack-react-start@8757

@clerk/testing

npm i https://pkg.pr.new/@clerk/testing@8757

@clerk/ui

npm i https://pkg.pr.new/@clerk/ui@8757

@clerk/upgrade

npm i https://pkg.pr.new/@clerk/upgrade@8757

@clerk/vue

npm i https://pkg.pr.new/@clerk/vue@8757

commit: 9353469

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/e2e-staging.yml (1)

378-379: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Slack failure text is now misleading for validation-gate failures.

At Line 378, the message always says “Staging E2E tests failed”, but this notification can now also fire when validate-instances fails and test legs are skipped. Please update wording to cover both failure sources.

Suggested update

-                    "text": "*:red_circle: Staging E2E tests failed*\n*Repo:* `${{ github.repository }}`\n*Ref:* `${{ steps.inputs.outputs.ref }}`\n*SDK:* `${{ steps.inputs.outputs.sdk-source }}`\n*clerk_go commit:* `${{ steps.inputs.outputs.clerk-go-commit-sha || 'N/A' }}`\n*Run:* <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>"
+                    "text": "*:red_circle: Staging E2E workflow failed*\n*Failure source:* `${{ needs.validate-instances.result == 'failure' && 'staging instance validation gate' || 'integration tests' }}`\n*Repo:* `${{ github.repository }}`\n*Ref:* `${{ steps.inputs.outputs.ref }}`\n*SDK:* `${{ steps.inputs.outputs.sdk-source }}`\n*clerk_go commit:* `${{ steps.inputs.outputs.clerk-go-commit-sha || 'N/A' }}`\n*Run:* <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-staging.yml around lines 378 - 379, Update the Slack
notification "text" field so it no longer always reads "Staging E2E tests
failed" and instead covers both failure cases (E2E test failures and
validation-gate failures that skip test legs). Locate the Slack step that sets
the "text" property (the multiline string starting with "*:red_circle: Staging
E2E tests failed*") and change the message to a neutral combined message such as
"*:red_circle: Staging E2E tests or instance validation failed*" (or similar
wording) while preserving the existing repo/ref/SDK/commit/run placeholders and
link formatting; ensure the modified "text" string still interpolates the same
GitHub action output variables (`${{ steps.inputs.outputs.* }}`).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/e2e-staging.yml:
- Around line 378-379: Update the Slack notification "text" field so it no
longer always reads "Staging E2E tests failed" and instead covers both failure
cases (E2E test failures and validation-gate failures that skip test legs).
Locate the Slack step that sets the "text" property (the multiline string
starting with "*:red_circle: Staging E2E tests failed*") and change the message
to a neutral combined message such as "*:red_circle: Staging E2E tests or
instance validation failed*" (or similar wording) while preserving the existing
repo/ref/SDK/commit/run placeholders and link formatting; ensure the modified
"text" string still interpolates the same GitHub action output variables (`${{
steps.inputs.outputs.* }}`).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: bc91a59f-9b2d-4a4f-af14-9d8b212691ef

📥 Commits

Reviewing files that changed from the base of the PR and between bf10aa7 and 9353469.

📒 Files selected for processing (2)

.changeset/staging-e2e-validate-gate.md
.github/workflows/e2e-staging.yml

✅ Files skipped from review due to trivial changes (1)

.changeset/staging-e2e-validate-gate.md

github-actions Bot added the actions label Jun 5, 2026

vercel Bot deployed to Preview June 5, 2026 03:01 View deployment

jacekradko mentioned this pull request Jun 5, 2026

ci(e2e): add a curated @smoke gating leg, make the full generic leg informational #8759

Open

jacekradko force-pushed the jacek/staging-e2e-resilience-p0 branch from fc18bdf to 7b59e11 Compare June 5, 2026 11:40

jacekradko force-pushed the jacek/staging-e2e-validate-gate branch from 07c335c to 0eb5396 Compare June 5, 2026 11:42

vercel Bot deployed to Preview June 5, 2026 11:42 View deployment

Base automatically changed from jacek/staging-e2e-resilience-p0 to main June 5, 2026 16:15

wobsoriano approved these changes Jun 9, 2026

View reviewed changes

vercel Bot had a problem deploying to Preview – swingset June 11, 2026 16:34 Failure

github-actions Bot added the integration label Jun 11, 2026

vercel Bot deployed to Preview – clerk-js-sandbox June 11, 2026 16:35 View deployment

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

jacekradko added 3 commits June 11, 2026 12:16

ci(e2e): treat captcha_enabled drift as critical in the staging gate

1769678

ci(e2e): make the captcha gate self-contained and notify Slack on gat…

9353469

…e failures

jacekradko force-pushed the jacek/staging-e2e-validate-gate branch from bf10aa7 to 9353469 Compare June 11, 2026 17:19

vercel Bot deployed to Preview – clerk-js-sandbox June 11, 2026 17:20 View deployment

vercel Bot deployed to Preview – swingset June 11, 2026 17:21 View deployment

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(e2e): gate staging e2e on critical staging-instance config drift#8757

ci(e2e): gate staging e2e on critical staging-instance config drift#8757
jacekradko wants to merge 3 commits into
mainfrom
jacek/staging-e2e-validate-gate

jacekradko commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

changeset-bot Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

❌ Failed checks (1 warning)

Uh oh!

vercel Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

pkg-pr-new Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacekradko commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

changeset-bot Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

❌ Failed checks (1 warning)

Uh oh!

vercel Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pkg-pr-new Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacekradko commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

changeset-bot Bot commented Jun 5, 2026 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

vercel Bot commented Jun 5, 2026 •

edited

Loading