fix(server): explicit manual-resume hint when web UI rejects/approves a non-web-parent run#1523
Conversation
… a non-web-parent run Background When a workflow run was started from the CLI (or any non-web platform) and is then approved/rejected via the Web UI, `tryAutoResumeAfterGate` correctly skips dispatch — `dispatchToOrchestrator` is wired to the web adapter and would misroute Slack/Telegram/CLI parents. The skip is intentional. The bug is in what the user sees afterwards: the API response said only "Send a message to continue" or "On-reject prompt will run on resume", which is meaningless to a web-UI user whose run was started from a terminal. The Web UI dropped success messages entirely (only `setActionError` rendered), so even the vague hint never reached the user. The run sits in `failed` status with `metadata.rejection_reason` populated and no clear next step. Closes coleam00#1522. Change `tryAutoResumeAfterGate` now returns a structured discriminated union instead of a plain boolean: { resumed: true } | { resumed: false; reason: 'no_parent' | 'no_platform_conv' | 'non_web_parent' | 'dispatch_failed' } The four `reason` values mirror the existing log-event guard branches one-to-one. A new `manualResumeMessage()` helper constructs the user-facing hint per reason — CLI-only command for `no_parent`, both options for chat parents (`non_web_parent`), CLI-with-context for `no_platform_conv`, and "dispatch failed" for the catch-branch. Every non-resumed branch now names the exact `archon workflow resume <id>` command so the user has an actionable next step. The `/api/workflows/runs/:runId/resume` endpoint also surfaces the same explicit command instead of "Re-run the workflow to auto-resume". Tests Existing approve/reject auto-resume tests updated to assert the new explicit hint format. Added regression coverage for all four non-resumed branches on both approve and reject (eight new tests), each verifying the run-id-substituted CLI command appears in the response. The existing happy paths (web-parent dispatch, full-cancel, max-attempts) are untouched. Behavior change is response-text only — the dispatch decision logic, log events, and DB writes are byte-identical.
📝 WalkthroughWalkthroughAuto-resume failures during workflow reject/approve now return structured error reasons ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/server/src/routes/api.ts (1)
1151-1167: ⚡ Quick winAdd exhaustive
defaultguard to theswitchto prevent silentundefinedon future union extension.All four current members of
reasonare handled, but without adefault: neverbranch the function will returnundefinedat runtime if the union is extended — TypeScript only catches this reliably whennoImplicitReturns: trueis active in tsconfig.✅ Proposed fix
case 'dispatch_failed': return `Auto-resume dispatch failed. Run ${cliCommand} from a terminal to ${verb}.`; + default: { + // Exhaustive check — forces a compile error if the union grows without a matching case. + const _exhaustive: never = reason; + return `Run ${cliCommand} from a terminal to ${verb}.`; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/server/src/routes/api.ts` around lines 1151 - 1167, The switch over the variable `reason` currently handles four cases but lacks an exhaustive `default`, which can lead to returning undefined if the `reason` union is extended; modify the switch (the block that references `reason`, `cliCommand`, and `verb`) to include a `default` branch that enforces exhaustiveness—e.g., assign `reason` to a `never`-typed variable or call a shared `assertUnreachable(reason)` helper and then throw a clear Error (or return a safe fallback message) so the compiler/runtime will catch any future unknown `reason` values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/server/src/routes/api.ts`:
- Around line 1151-1167: The switch over the variable `reason` currently handles
four cases but lacks an exhaustive `default`, which can lead to returning
undefined if the `reason` union is extended; modify the switch (the block that
references `reason`, `cliCommand`, and `verb`) to include a `default` branch
that enforces exhaustiveness—e.g., assign `reason` to a `never`-typed variable
or call a shared `assertUnreachable(reason)` helper and then throw a clear Error
(or return a safe fallback message) so the compiler/runtime will catch any
future unknown `reason` values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f57de0a2-6e16-4a15-b160-e59c374f7ec8
📒 Files selected for processing (2)
packages/server/src/routes/api.tspackages/server/src/routes/api.workflow-runs.test.ts
Summary
tryAutoResumeAfterGatecorrectly skips dispatch — but the API response only said "Send a message to continue" / "On-reject prompt will run on resume", and the Web UI rendered no success message at all. The run lands infailedstatus withmetadata.rejection_reasonpopulated and the user has no idea how to proceed.df-implement-with-preview-fastafter a real reject — the workflow died silently mid-iteration with a one-paragraph rejection comment that the on_reject prompt was supposed to consume. Recovery required reading the DB schema to discoverarchon workflow resume <id>exists.tryAutoResumeAfterGatenow returns a structured{ resumed } | { resumed: false; reason }discriminated union. Each non-resumed branch produces a tailored next-step hint that names the exactarchon workflow resume <run-id>command. The Resume endpoint message gets the same upgrade.parent_conversation_id-based cross-adapter guard, log events, DB writes — all byte-identical. No Web UI changes (toast/banner work is called out as a separate follow-up in bug(web): Reject + Resume buttons dead-end silently for non-web-parent runs #1522).UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
rejectWorkflowRunRoutetryAutoResumeAfterGateAutoResumeResultdiscriminatorapproveWorkflowRunRoutetryAutoResumeAfterGaterejectWorkflowRunRoutemanualResumeMessageresumed === falseapproveWorkflowRunRoutemanualResumeMessageresumed === falseresumeWorkflowRunRoutetryAutoResumeAfterGate→ log event namesLabel Snapshot
risk: lowsize: Sserverserver:api-routesChange Metadata
bugserverLinked Issue
Validation Evidence
api.workflow-runs.test.tsadds 8 new tests covering all four non-resumed branches × {approve, reject} and asserts each response message contains the substituted run id and the literalarchon workflow resume. Existing happy-path tests for web-parent dispatch unchanged. The pre-existingloader.test.ts/dag-executor.test.tsfailures ondevreproduce on a clean checkout in this WSL2 environment without my changes — unrelated.bun run lintwithout raised heap OOMs in WSL2 ondev(also unrelated). Ran withNODE_OPTIONS='--max-old-space-size=8192'and got clean.Security Impact
Compatibility / Migration
successboolean and HTTP status are unchanged. No callers parse the message content beyond display.Human Verification
archon workflow resume <id>, no orchestrator dispatch, run set tostatus='failed'withrejection_countincremented.archon workflow resume <id>, no orchestrator dispatch.archon workflow resume <id>.archon workflow resume <id>.dispatch_failedbranch and containsarchon workflow resume <id>.Side Effects / Blast Radius
packages/server/src/routes/api.ts(3 endpoints: approve / reject / resume).archon workflow resume,originating thread,no longer available,Auto-resume dispatch failed. The Web UI does not do that today.api.workflow_*_auto_resume_*) are unchanged → existing dashboards keep working.Rollback Plan
dev. No state migration, no UI dependency.Risks and Mitigations
archon workflow resume <id>string. That's the intended behavior here — but if the binary name ever changes, this string needs to follow. Mitigation: the literal lives in one helper (manualResumeMessage); a singleFind Usagescovers it.archon workflow resume <id>,originating thread,no longer available) rather than the full sentence.🤖 Generated with Claude Code
Summary by CodeRabbit
archon workflow resume <runId>) to all resume-related messages for manual recovery.