Skip to content

fix(server): explicit manual-resume hint when web UI rejects/approves a non-web-parent run#1523

Open
blankse wants to merge 1 commit intocoleam00:devfrom
blankse:fix/1522-non-web-parent-reject-resume-messaging
Open

fix(server): explicit manual-resume hint when web UI rejects/approves a non-web-parent run#1523
blankse wants to merge 1 commit intocoleam00:devfrom
blankse:fix/1522-non-web-parent-reject-resume-messaging

Conversation

@blankse
Copy link
Copy Markdown

@blankse blankse commented May 1, 2026

Summary

  • Problem: When a workflow run is approved/rejected via the Web UI but was started from the CLI (or Slack/Telegram/etc.), tryAutoResumeAfterGate correctly skips dispatch — but the API response only said "Send a message to continue" / "On-reject prompt will run on resume", and the Web UI rendered no success message at all. The run lands in failed status with metadata.rejection_reason populated and the user has no idea how to proceed.
  • Why it matters: Reproduced locally with df-implement-with-preview-fast after a real reject — the workflow died silently mid-iteration with a one-paragraph rejection comment that the on_reject prompt was supposed to consume. Recovery required reading the DB schema to discover archon workflow resume <id> exists.
  • What changed: tryAutoResumeAfterGate now returns a structured { resumed } | { resumed: false; reason } discriminated union. Each non-resumed branch produces a tailored next-step hint that names the exact archon workflow resume <run-id> command. The Resume endpoint message gets the same upgrade.
  • What did NOT change (scope boundary): Dispatch decision logic, the parent_conversation_id-based cross-adapter guard, log events, DB writes — all byte-identical. No Web UI changes (toast/banner work is called out as a separate follow-up in bug(web): Reject + Resume buttons dead-end silently for non-web-parent runs #1522).

UX Journey

Before

User                     Web UI                   API server                       DB
────                     ──────                   ──────────                       ──
clicks Reject  ─────────▶                                                             
+ types reason           POST /reject ──────────▶ rejectWorkflow()  ──────────────▶ status='failed'
                                                                                      rejection_reason set
                                                  tryAutoResumeAfterGate() →  
                                                    parent_conv null? → return false
                                                  return { message: "...will
                                                          run on resume." }
                          response received
                          ↑
                          handleReject only renders setActionError on throw,
                          drops success messages → UI silently refreshes,
                          run shows as 'failed' with no next-step indicator
                                                                                       
sees nothing actionable; manually checks DB to discover archon workflow resume <id>

After

User                     Web UI                   API server                       DB
────                     ──────                   ──────────                       ──
clicks Reject  ─────────▶                                                             
+ types reason           POST /reject ──────────▶ rejectWorkflow()  ──────────────▶ status='failed'
                                                                                      rejection_reason set
                                                  tryAutoResumeAfterGate() →
                                                    [returns { resumed: false,
                                                               reason: 'no_parent' }]
                                                  return {
                                                    message: "Workflow rejected: X.
                                                      *Run `archon workflow resume <id>`*
                                                      from the working directory to
                                                      apply the on-reject prompt." }
                          response received
                          ↑
                          API consumers (curl, future toast surface) see the
                          exact CLI command they need. UI surfacing follow-up
                          tracked in #1522.

Architecture Diagram

Before

              ┌───────────────┐
              │  Web UI       │
              │  Reject btn   │
              └───────┬───────┘
                      │ POST /reject
                      ▼
              ┌───────────────┐        ┌─────────────────────────────┐
              │ rejectWFRoute │ ────▶  │  tryAutoResumeAfterGate     │
              │               │        │   returns boolean           │
              │               │ ◀────  │                             │
              └───────┬───────┘        └─────────────────────────────┘
                      │ message: "...will run on resume." (vague)
                      ▼
              ┌───────────────┐
              │  HTTP response│
              └───────────────┘

After

              ┌───────────────┐
              │  Web UI       │
              │  Reject btn   │
              └───────┬───────┘
                      │ POST /reject
                      ▼
              ┌───────────────┐        ┌─────────────────────────────┐
              │ rejectWFRoute │ ════▶  │ [~] tryAutoResumeAfterGate  │
              │               │        │   returns AutoResumeResult  │
              │               │ ◀════  │   { resumed } | { reason }  │
              │               │        └─────────────────────────────┘
              │      ║                 ┌─────────────────────────────┐
              │      ╚════════════════▶│ [+] manualResumeMessage()   │
              │                        │   reason → CLI/thread hint  │
              │      ╔════════════════ │                             │
              │      ║                 └─────────────────────────────┘
              ▼      ║
              ┌───────────────┐
              │  HTTP response│ "...Run `archon workflow resume <id>`
              │               │  from the working directory to apply
              └───────────────┘  the on-reject prompt." (actionable)

Connection inventory:

From To Status Notes
rejectWorkflowRunRoute tryAutoResumeAfterGate modified now consumes AutoResumeResult discriminator
approveWorkflowRunRoute tryAutoResumeAfterGate modified same
rejectWorkflowRunRoute manualResumeMessage new called when resumed === false
approveWorkflowRunRoute manualResumeMessage new called when resumed === false
resumeWorkflowRunRoute (response message) modified inline string updated to name the CLI command
tryAutoResumeAfterGate → log event names unchanged the four event names stay literal & greppable

Label Snapshot

  • Risk: risk: low
  • Size: size: S
  • Scope: server
  • Module: server:api-routes

Change Metadata

  • Change type: bug
  • Primary scope: server

Linked Issue

Validation Evidence

$ bun run check:bundled
bundled-defaults.generated.ts is up to date (36 commands, 20 workflows).

$ bun run type-check
all 10 packages: Exited with code 0

$ NODE_OPTIONS='--max-old-space-size=8192' bun run lint --max-warnings 0
EXIT=0   # 0 errors, 0 warnings

$ bun run format:check
All matched files use Prettier code style!

$ bun --filter @archon/server test
73 pass, 0 fail (api.workflow-runs.test.ts)
+ all other server suites green
  • Evidence provided: api.workflow-runs.test.ts adds 8 new tests covering all four non-resumed branches × {approve, reject} and asserts each response message contains the substituted run id and the literal archon workflow resume. Existing happy-path tests for web-parent dispatch unchanged. The pre-existing loader.test.ts / dag-executor.test.ts failures on dev reproduce on a clean checkout in this WSL2 environment without my changes — unrelated.
  • Skipped commands: bun run lint without raised heap OOMs in WSL2 on dev (also unrelated). Ran with NODE_OPTIONS='--max-old-space-size=8192' and got clean.

Security Impact

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Compatibility / Migration

  • Backward compatible? Yes — only HTTP response message strings change. The success boolean and HTTP status are unchanged. No callers parse the message content beyond display.
  • Config/env changes? No
  • Database migration needed? No

Human Verification

  • Verified scenarios:
    • reject + on_reject + parent null (CLI-started run): response contains archon workflow resume <id>, no orchestrator dispatch, run set to status='failed' with rejection_count incremented.
    • approve + parent null: response contains archon workflow resume <id>, no orchestrator dispatch.
    • reject + on_reject + non-web parent (Telegram fixture): response contains both "originating thread" and archon workflow resume <id>.
    • reject + on_reject + parent conversation deleted: response contains "no longer available" + archon workflow resume <id>.
    • reject + on_reject + DB throw on parent lookup: response hits dispatch_failed branch and contains archon workflow resume <id>.
  • Edge cases checked:
    • happy path (web parent + dispatch ok) still says "Resuming workflow." / "Running on-reject prompt." (unchanged).
    • reject without on_reject still cancels and doesn't auto-resume (unchanged).
    • max-attempts cancellation message unchanged.
  • What was not verified: Web UI side — this PR is API-only by design (issue bug(web): Reject + Resume buttons dead-end silently for non-web-parent runs #1522 lists Web UI surfacing as a separate follow-up).

Side Effects / Blast Radius

  • Affected subsystems: only packages/server/src/routes/api.ts (3 endpoints: approve / reject / resume).
  • Potential unintended effects: any consumer parsing the response message text by substring match would need to see the new substrings — archon workflow resume, originating thread, no longer available, Auto-resume dispatch failed. The Web UI does not do that today.
  • Guardrails/monitoring: log event names (api.workflow_*_auto_resume_*) are unchanged → existing dashboards keep working.

Rollback Plan

  • Fast rollback: revert the single commit on dev. No state migration, no UI dependency.
  • Feature flags: none — change is unconditional.
  • Observable failure symptoms: a unit test would catch any regression to the previous "Send a message to continue" wording. Manual reject from the dashboard against a CLI-started run should now show the explicit CLI command in the API response (visible via DevTools network tab even today, before the UI surfacing follow-up).

Risks and Mitigations

  • Risk: A future Web UI change might want to render the message verbatim and would inherit the literal archon workflow resume <id> string. That's the intended behavior here — but if the binary name ever changes, this string needs to follow. Mitigation: the literal lives in one helper (manualResumeMessage); a single Find Usages covers it.
  • Risk: Tests that assert on substring of message text could become brittle. Mitigation: kept the asserted substrings to the durable concepts (archon workflow resume <id>, originating thread, no longer available) rather than the full sentence.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Workflow resume endpoints now provide specific reasons when auto-resume fails (e.g., missing parent, unsupported platform).
    • Added CLI command fallback (archon workflow resume <runId>) to all resume-related messages for manual recovery.
    • Messages now differentiate between approval and rejection scenarios with tailored guidance.

… a non-web-parent run

Background

When a workflow run was started from the CLI (or any non-web platform)
and is then approved/rejected via the Web UI, `tryAutoResumeAfterGate`
correctly skips dispatch — `dispatchToOrchestrator` is wired to the web
adapter and would misroute Slack/Telegram/CLI parents. The skip is
intentional. The bug is in what the user sees afterwards: the API
response said only "Send a message to continue" or "On-reject prompt
will run on resume", which is meaningless to a web-UI user whose run
was started from a terminal. The Web UI dropped success messages
entirely (only `setActionError` rendered), so even the vague hint never
reached the user. The run sits in `failed` status with
`metadata.rejection_reason` populated and no clear next step.

Closes coleam00#1522.

Change

`tryAutoResumeAfterGate` now returns a structured discriminated union
instead of a plain boolean:

  { resumed: true } | { resumed: false; reason: 'no_parent' | 'no_platform_conv' | 'non_web_parent' | 'dispatch_failed' }

The four `reason` values mirror the existing log-event guard branches
one-to-one. A new `manualResumeMessage()` helper constructs the
user-facing hint per reason — CLI-only command for `no_parent`, both
options for chat parents (`non_web_parent`), CLI-with-context for
`no_platform_conv`, and "dispatch failed" for the catch-branch. Every
non-resumed branch now names the exact `archon workflow resume <id>`
command so the user has an actionable next step.

The `/api/workflows/runs/:runId/resume` endpoint also surfaces the
same explicit command instead of "Re-run the workflow to auto-resume".

Tests

Existing approve/reject auto-resume tests updated to assert the new
explicit hint format. Added regression coverage for all four
non-resumed branches on both approve and reject (eight new tests),
each verifying the run-id-substituted CLI command appears in the
response. The existing happy paths (web-parent dispatch, full-cancel,
max-attempts) are untouched.

Behavior change is response-text only — the dispatch decision logic,
log events, and DB writes are byte-identical.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

📝 Walkthrough

Walkthrough

Auto-resume failures during workflow reject/approve now return structured error reasons (no_parent, non_web_parent, dispatch_failed) and generate actionable UI messages that include explicit archon workflow resume <runId> CLI fallback commands, replacing generic "will run on resume" text.

Changes

Cohort / File(s) Summary
Auto-Resume Result Handling & Messaging
packages/server/src/routes/api.ts
Replaces tryAutoResumeAfterGate boolean return with structured AutoResumeResult enum reporting failure reason. Adds manualResumeMessage() function to generate reason-specific UI text with archon workflow resume CLI command. Updates /approve, /reject, and /resume endpoints to conditionally use structured result for choosing between auto-resume success messaging or manual remediation instructions.
Endpoint & Message Tests
packages/server/src/routes/api.workflow-runs.test.ts
Extended tests to verify all endpoints include explicit archon workflow resume <runId> command in response messages. Added four new reject-flow test variants covering null parent, non-web platform parents, deleted parents, and dispatch failure scenarios. Updated approve auto-resume skip tests to assert CLI hints instead of generic prompts. All guard-failure branches now assert message includes run ID and archon workflow resume literal.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 A hop through the dark, a run that won't start,
But now says "archon resume" — a clear, helpful heart!
No silent dead-ends when you click to reject,
Just CLI commands that get proper respect. 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding explicit manual-resume hints for approve/reject when runs cannot auto-resume (non-web-parent case).
Description check ✅ Passed Description comprehensively covers all required sections: problem/why/what changed, before/after UX journey, architecture diagram with module inventory, metadata, validation evidence with test results, security impact, compatibility, human verification, side effects, rollback plan, and risks/mitigations.
Linked Issues check ✅ Passed PR fully addresses #1522 requirements: returns structured AutoResumeResult from tryAutoResumeAfterGate, generates actionable CLI commands via manualResumeMessage(), updates approve/reject/resume routes, adds comprehensive tests for all non-resume branches, and keeps cross-adapter guard logic unchanged.
Out of Scope Changes check ✅ Passed All changes are scoped to the API server layer (api.ts and api.workflow-runs.test.ts). No Web UI changes, dispatch logic changes, or DB schema modifications—precisely as specified. The PR explicitly defers Web UI surfacing to issue #1522 follow-up.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/server/src/routes/api.ts (1)

1151-1167: ⚡ Quick win

Add exhaustive default guard to the switch to prevent silent undefined on future union extension.

All four current members of reason are handled, but without a default: never branch the function will return undefined at runtime if the union is extended — TypeScript only catches this reliably when noImplicitReturns: true is active in tsconfig.

✅ Proposed fix
     case 'dispatch_failed':
       return `Auto-resume dispatch failed. Run ${cliCommand} from a terminal to ${verb}.`;
+    default: {
+      // Exhaustive check — forces a compile error if the union grows without a matching case.
+      const _exhaustive: never = reason;
+      return `Run ${cliCommand} from a terminal to ${verb}.`;
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server/src/routes/api.ts` around lines 1151 - 1167, The switch over
the variable `reason` currently handles four cases but lacks an exhaustive
`default`, which can lead to returning undefined if the `reason` union is
extended; modify the switch (the block that references `reason`, `cliCommand`,
and `verb`) to include a `default` branch that enforces exhaustiveness—e.g.,
assign `reason` to a `never`-typed variable or call a shared
`assertUnreachable(reason)` helper and then throw a clear Error (or return a
safe fallback message) so the compiler/runtime will catch any future unknown
`reason` values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/server/src/routes/api.ts`:
- Around line 1151-1167: The switch over the variable `reason` currently handles
four cases but lacks an exhaustive `default`, which can lead to returning
undefined if the `reason` union is extended; modify the switch (the block that
references `reason`, `cliCommand`, and `verb`) to include a `default` branch
that enforces exhaustiveness—e.g., assign `reason` to a `never`-typed variable
or call a shared `assertUnreachable(reason)` helper and then throw a clear Error
(or return a safe fallback message) so the compiler/runtime will catch any
future unknown `reason` values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f57de0a2-6e16-4a15-b160-e59c374f7ec8

📥 Commits

Reviewing files that changed from the base of the PR and between 69b2c89 and 3374fc0.

📒 Files selected for processing (2)
  • packages/server/src/routes/api.ts
  • packages/server/src/routes/api.workflow-runs.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(web): Reject + Resume buttons dead-end silently for non-web-parent runs

1 participant