ci(e2e): gate staging e2e on critical staging-instance config drift#8757
ci(e2e): gate staging e2e on critical staging-instance config drift#8757jacekradko wants to merge 3 commits into
Conversation
🦋 Changeset detectedLatest commit: 9353469 The changes in this PR will be included in the next version bump. This PR includes changesets to release 0 packagesWhen changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📝 WalkthroughWalkthroughThis PR adds critical-path validation with strict gating to staging instance configuration checks, wires this validation into the E2E workflow to skip tests on configuration drift, and provides comprehensive test coverage for the new validation logic. ChangesStaging E2E Validation Strict Gating
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
fc18bdf to
7b59e11
Compare
07c335c to
0eb5396
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/validate-staging-instances.mjs (1)
24-32:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCritical captcha drift is filtered out before strict gating.
user_settings.sign_up.captcha_enabledis marked critical (Line 61), butisIgnoredstill drops*.captcha_enabled(Line 28) before classification (Line 452). That makes this critical path non-blocking in practice.Suggested fix
const IGNORED_PATHS = [ /\.id$/, /^auth_config\.id$/, /\.logo_url$/, - /\.captcha_enabled$/, - /\.captcha_widget_type$/, /\.enforce_hibp_on_sign_in$/, /\.disable_hibp$/, ];Also applies to: 47-62, 452-457
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/validate-staging-instances.mjs` around lines 24 - 32, The IGNORED_PATHS array contains a /\.captcha_enabled$/ pattern which causes isIgnored to drop captcha flags before they can be classified as critical (specifically user_settings.sign_up.captcha_enabled); remove or narrow that pattern in IGNORED_PATHS (or change the order so classification runs before isIgnored) so that user_settings.sign_up.captcha_enabled is evaluated by the existing critical-path logic; locate IGNORED_PATHS and the isIgnored call in scripts/validate-staging-instances.mjs and ensure *.captcha_enabled is not globally filtered out prior to the criticality check.
🧹 Nitpick comments (1)
scripts/validate-staging-instances.test.mjs (1)
647-717: ⚡ Quick winAdd a strict-mode
main()regression test forcaptcha_enableddrift.Current tests validate captcha at classifier level, but not through the full
main()pipeline. Add one case whereuser_settings.sign_up.captcha_enableddiffers andmain({ strict: true })must exit with1.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/validate-staging-instances.test.mjs` around lines 647 - 717, Add a new regression test in scripts/validate-staging-instances.test.mjs that mirrors the existing strict-mode cases but uses a drift in user_settings.sign_up.captcha_enabled: call setPair(), mockEnvPair() with one env having user_settings: {...emptyUserSettings(), sign_up: { captcha_enabled: true }} and the other having sign_up: { captcha_enabled: false }, then await expect(main({ strict: true })).rejects.toThrow('process.exit(1)') and assert exitCode === 1 and consoleLogs contains the blocking mismatch message; follow the pattern used in the other tests (e.g., the "exits non-zero in strict mode when a critical config path drifts" test) to place and name the new it(...) block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@scripts/validate-staging-instances.mjs`:
- Around line 24-32: The IGNORED_PATHS array contains a /\.captcha_enabled$/
pattern which causes isIgnored to drop captcha flags before they can be
classified as critical (specifically user_settings.sign_up.captcha_enabled);
remove or narrow that pattern in IGNORED_PATHS (or change the order so
classification runs before isIgnored) so that
user_settings.sign_up.captcha_enabled is evaluated by the existing critical-path
logic; locate IGNORED_PATHS and the isIgnored call in
scripts/validate-staging-instances.mjs and ensure *.captcha_enabled is not
globally filtered out prior to the criticality check.
---
Nitpick comments:
In `@scripts/validate-staging-instances.test.mjs`:
- Around line 647-717: Add a new regression test in
scripts/validate-staging-instances.test.mjs that mirrors the existing
strict-mode cases but uses a drift in user_settings.sign_up.captcha_enabled:
call setPair(), mockEnvPair() with one env having user_settings:
{...emptyUserSettings(), sign_up: { captcha_enabled: true }} and the other
having sign_up: { captcha_enabled: false }, then await expect(main({ strict:
true })).rejects.toThrow('process.exit(1)') and assert exitCode === 1 and
consoleLogs contains the blocking mismatch message; follow the pattern used in
the other tests (e.g., the "exits non-zero in strict mode when a critical config
path drifts" test) to place and name the new it(...) block.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 91ce88f1-9bd4-4352-b093-5a8ffded5427
📒 Files selected for processing (8)
.changeset/staging-e2e-resilience-p0.md.changeset/staging-e2e-validate-gate.md.github/workflows/e2e-staging.ymlintegration/playwright.config.tsintegration/tests/custom-pages.test.tsintegration/tests/whatsapp-phone-code.test.tsscripts/validate-staging-instances.mjsscripts/validate-staging-instances.test.mjs
validate-staging-instances.mjs already diffs prod vs staging /v1/environment but every exit path returned 0, so detected drift blocked nothing and the job was not a dependency of the test matrix. A drifted staging mirror (e.g. a missing phone_number WhatsApp channel) therefore surfaced only as opaque test timeouts 200 tests deep. Add a tight CRITICAL_PATHS allowlist (attribute enabled toggles, phone_number.channels, auth factors/strategies, social enable/disable, password settings) and an ACCEPTED_DRIFT escape hatch so known gaps don't block while new drift does. In strict mode the script exits non-zero on a blocking mismatch; fetch failures and cosmetic drift never fail the build. Wire integration-tests to need validate-instances, and drive strictness from the STAGING_VALIDATE_STRICT repo variable (default report-only). So this is a no-op until the team opts in: it logs blocking drift and the proposed gate without failing anything. Flip the variable to make it enforce.
bf10aa7 to
9353469
Compare
@clerk/astro
@clerk/backend
@clerk/chrome-extension
@clerk/clerk-js
@clerk/expo
@clerk/expo-passkeys
@clerk/express
@clerk/fastify
@clerk/hono
@clerk/localizations
@clerk/nextjs
@clerk/nuxt
@clerk/react
@clerk/react-router
@clerk/shared
@clerk/tanstack-react-start
@clerk/testing
@clerk/ui
@clerk/upgrade
@clerk/vue
commit: |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/e2e-staging.yml (1)
378-379:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSlack failure text is now misleading for validation-gate failures.
At Line 378, the message always says “Staging E2E tests failed”, but this notification can now also fire when
validate-instancesfails and test legs are skipped. Please update wording to cover both failure sources.Suggested update
- "text": "*:red_circle: Staging E2E tests failed*\n*Repo:* `${{ github.repository }}`\n*Ref:* `${{ steps.inputs.outputs.ref }}`\n*SDK:* `${{ steps.inputs.outputs.sdk-source }}`\n*clerk_go commit:* `${{ steps.inputs.outputs.clerk-go-commit-sha || 'N/A' }}`\n*Run:* <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>" + "text": "*:red_circle: Staging E2E workflow failed*\n*Failure source:* `${{ needs.validate-instances.result == 'failure' && 'staging instance validation gate' || 'integration tests' }}`\n*Repo:* `${{ github.repository }}`\n*Ref:* `${{ steps.inputs.outputs.ref }}`\n*SDK:* `${{ steps.inputs.outputs.sdk-source }}`\n*clerk_go commit:* `${{ steps.inputs.outputs.clerk-go-commit-sha || 'N/A' }}`\n*Run:* <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/e2e-staging.yml around lines 378 - 379, Update the Slack notification "text" field so it no longer always reads "Staging E2E tests failed" and instead covers both failure cases (E2E test failures and validation-gate failures that skip test legs). Locate the Slack step that sets the "text" property (the multiline string starting with "*:red_circle: Staging E2E tests failed*") and change the message to a neutral combined message such as "*:red_circle: Staging E2E tests or instance validation failed*" (or similar wording) while preserving the existing repo/ref/SDK/commit/run placeholders and link formatting; ensure the modified "text" string still interpolates the same GitHub action output variables (`${{ steps.inputs.outputs.* }}`).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.github/workflows/e2e-staging.yml:
- Around line 378-379: Update the Slack notification "text" field so it no
longer always reads "Staging E2E tests failed" and instead covers both failure
cases (E2E test failures and validation-gate failures that skip test legs).
Locate the Slack step that sets the "text" property (the multiline string
starting with "*:red_circle: Staging E2E tests failed*") and change the message
to a neutral combined message such as "*:red_circle: Staging E2E tests or
instance validation failed*" (or similar wording) while preserving the existing
repo/ref/SDK/commit/run placeholders and link formatting; ensure the modified
"text" string still interpolates the same GitHub action output variables (`${{
steps.inputs.outputs.* }}`).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: bc91a59f-9b2d-4a4f-af14-9d8b212691ef
📒 Files selected for processing (2)
.changeset/staging-e2e-validate-gate.md.github/workflows/e2e-staging.yml
✅ Files skipped from review due to trivial changes (1)
- .changeset/staging-e2e-validate-gate.md
Follow-up to #8756. The
validate-staging-instancesscript already compares prod vs staging/v1/environmentand prints a diff, but it always exited 0, so a drifted staging mirror (like the missing WhatsApp channel that makeswhatsapp-phone-codetime out) blocked nothing and stayed invisible until tests failed 200-deep.This gives the script teeth without flipping any behavior yet. It gains a tight
CRITICAL_PATHSallowlist (attributeenabledtoggles,phone_number.channels, auth factors, social enable/disable, password policy) plus anACCEPTED_DRIFTescape hatch, so a known and tracked gap doesn't block while new drift does. In strict mode it exits non-zero on a blocking mismatch; fetch failures and cosmetic drift never fail the build.Strictness is driven by the
STAGING_VALIDATE_STRICTrepo variable and defaults to report-only, andintegration-testsnow depends onvalidate-instances. So nothing changes until someone sets the variable: today it just logs the blocking drift and the gate it would apply. The piece worth a look is theCRITICAL_PATHSset, that is the policy of what is worth blocking a run over.Before enabling strict, run the validator against current staging to confirm the only blocking drift is expected, and add
ACCEPTED_DRIFTentries for anything intentionally tolerated. Stacked on #8756.Update: the branch is rebased onto main (dropping the stale pre-squash copy of #8756 and keeping main's TURBO_FORCE and report-path fixes), and
captcha_enabledis now inCRITICAL_PATHS. An enabled captcha blocks every in-browser sign-up in headless CI, which is what kept the staging generic leg red for a week (legal-consent vs Turnstile). The captcha ignore-list removal is included here too so the gate works standalone; it overlaps with #8832 by design and merges cleanly in either order, with a pipeline test pinning that critical paths cannot be swallowed by the ignore filter. Also, the report job now notifies Slack when the strict gate itself fails, since a gate failure skips the test legs rather than failing them and would otherwise be silent. Still report-only until the repo var is set; bring instances to parity first using #8832's report.Summary by CodeRabbit
Release Notes
Chores
Tests