Skip to content

fix(workflows): enable interactive workflow resume after approval gate#1259

Draft
coleam00 wants to merge 4 commits intodevfrom
archon/task-fix-1131-approval-resume
Draft

fix(workflows): enable interactive workflow resume after approval gate#1259
coleam00 wants to merge 4 commits intodevfrom
archon/task-fix-1131-approval-resume

Conversation

@coleam00
Copy link
Copy Markdown
Owner

Summary

  • Bug 1: Interactive workflows never passed parentConversationId to executeWorkflow, so parent_conversation_id was stored as NULL in the DB. findResumableRunByParentConversation queries this column and never found a match, breaking the entire resume chain.
  • Bug 2: The API approve endpoint incorrectly set status: 'failed' for interactive_loop approval types, causing getPausedWorkflowRun to return null and killing the natural-language resume path.
  • Bug 3: The approve endpoint never dispatched the resume automatically — users had to manually send a chat message after clicking Approve for the workflow to continue.

Changes

File Change
packages/core/src/orchestrator/orchestrator-agent.ts Pass conversation.id as parentConversationId (11th param) to the executeWorkflow call in the interactive workflow branch
packages/server/src/routes/api.ts Restructure approve endpoint: for interactive_loop, keep status paused and auto-dispatch to orchestrator via dispatchToOrchestrator; for standard approvals, logic is unchanged
packages/core/src/orchestrator/orchestrator-agent.test.ts Add assertion that executeWorkflow receives conversation.id as parentConversationId for interactive web workflows
.archon/workflows/e2e-*.yaml Update and extend E2E workflow test fixtures (haiku model, deterministic, skills/MCP)

Validation

All checks pass (bun run validate):

  • Type check: ✅ No errors (all 10 packages)
  • Lint: ✅ 0 errors, 0 warnings
  • Format: ✅
  • Tests: ✅ All passed (0 failures) — orchestrator-agent.test.ts: 97 pass

How to Test

  1. Create an interactive workflow (set interactive: true) with an approval or loop node
  2. Run it from the web UI — workflow pauses at the gate
  3. Click Approve (or enter a comment)
  4. Expected: workflow resumes automatically without sending another chat message
  5. Verify CLI workflow approve still works (unchanged path)

Fixes #1131

coleam00 and others added 2 commits April 16, 2026 10:40
#1131)

Interactive DAG workflows with approval/loop gates paused correctly but
never resumed on the web UI due to three cascading bugs: missing
parent_conversation_id in the DB record, wrong status mutation for
interactive loops, and no auto-dispatch from the approve endpoint.

Changes:
- Pass conversation.id as parentConversationId to executeWorkflow for
  interactive workflows so findResumableRunByParentConversation matches
- Keep interactive_loop runs in 'paused' status (not 'failed') so
  getPausedWorkflowRun finds them via the natural-language approval path
- Auto-dispatch to orchestrator after interactive_loop approval instead
  of requiring a manual follow-up message
- Add test assertion verifying parentConversationId is passed

Fixes #1131

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91f7898d-1c17-4bd1-a5c9-a376f93a70e9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch archon/task-fix-1131-approval-resume

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coleam00
Copy link
Copy Markdown
Owner Author

🔍 Comprehensive PR Review

PR: #1259 — fix(workflows): enable interactive workflow resume after approval gate
Reviewed by: 4 specialized agents (code-review, error-handling, test-coverage, docs-impact)
Date: 2026-04-16


Summary

The core fix is correct and minimal — adding conversation.id as parentConversationId to the interactive executeWorkflow call is exactly the right change. The approve endpoint restructure cleanly separates interactive_loop from standard approval paths. Three issues need attention before merge: a silent-success bug when the parent conversation cannot be resolved at dispatch time, and missing unit tests for the entirely-new interactive_loop approve branch.

Verdict: REQUEST_CHANGES

Severity Count
🔴 HIGH 3
🟡 MEDIUM 2
🟢 LOW 3

🔴 High Issues (Must Fix Before Merge)

1. Silent Success When Parent Conversation Cannot Be Resolved

📍 packages/server/src/routes/api.ts:1903-1911

When getConversationById returns null or platform_conversation_id is falsy, the handler returns { success: true, message: "Workflow approved and resuming..." } — but the workflow never resumes. The run is stuck permanently in paused with no log entry and no recovery path. Flagged by both code-review and error-handling agents independently.

Current code:

const parentConv = await conversationDb.getConversationById(parentConvDbId);
if (parentConv?.platform_conversation_id) {
  void dispatchToOrchestrator(parentConv.platform_conversation_id, comment);
}
return c.json({ success: true, message: "Workflow approved and resuming: ..." });

Fix: Guard with early error return when parentConv is null, and add .catch() on the dispatch:

if (!parentConv?.platform_conversation_id) {
  getLog().error({ runId, parentConvDbId }, 'api.workflow_run_approve_interactive_loop_no_parent_conv');
  return apiError(c, 500, 'Workflow approved but could not auto-resume: parent conversation not found. Send a message to continue the workflow.');
}
void dispatchToOrchestrator(parentConv.platform_conversation_id, comment).catch(err => {
  getLog().error({ err, runId }, 'api.workflow_run_approve_interactive_loop_dispatch_failed');
});
return c.json({ success: true, message: `Workflow approved and resuming: ${run.workflow_name}.` });

2. interactive_loop Approve Branch Has Zero Unit Test Coverage

📍 packages/server/src/routes/api.ts:1894-1911

The entire new code path has no tests. All three core behaviors can regress silently:

  1. Status must stay paused (not failed)
  2. Metadata key is loop_user_input (not approval_response)
  3. Dispatch uses parent_conversation_id ?? conversation_id

All mock infrastructure already exists in api.workflow-runs.test.ts (mockGetConversationById line 24, mockUpdateWorkflowRun line 169, mockHandleMessage line 41).

Suggested tests (add to api.workflow-runs.test.ts):

describe('POST /api/workflows/runs/:runId/approve — interactive_loop branch', () => {
  const MOCK_LOOP_RUN = {
    ...MOCK_RUNNING_RUN, id: 'run-loop-1', status: 'paused',
    conversation_id: 'worker-conv-uuid', parent_conversation_id: 'parent-conv-uuid',
    metadata: { approval: { type: 'interactive_loop', nodeId: 'loop-gate', message: 'Feedback?' } },
  };

  test('keeps status paused and stores loop_user_input', async () => {
    mockGetWorkflowRun.mockResolvedValueOnce(MOCK_LOOP_RUN);
    mockGetConversationById.mockResolvedValueOnce({ id: 'parent-conv-uuid', platform_conversation_id: 'web-parent-abc' });
    const { app } = makeApp();
    const res = await app.request('/api/workflows/runs/run-loop-1/approve', {
      method: 'POST', body: JSON.stringify({ comment: 'continue' }),
      headers: { 'Content-Type': 'application/json' },
    });
    expect(res.status).toBe(200);
    expect(mockUpdateWorkflowRun).toHaveBeenCalledWith('run-loop-1', { metadata: { loop_user_input: 'continue' } });
    const arg = mockUpdateWorkflowRun.mock.calls[0][1] as Record<string, unknown>;
    expect(arg).not.toHaveProperty('status'); // must NOT set status: 'failed'
    const body = (await res.json()) as { message: string };
    expect(body.message).toContain('resuming');
    expect(body.message).not.toContain('Send a message');
  });

  test('dispatches to parent_conversation_id', async () => {
    mockGetWorkflowRun.mockResolvedValueOnce(MOCK_LOOP_RUN);
    mockGetConversationById.mockResolvedValueOnce({ id: 'parent-conv-uuid', platform_conversation_id: 'web-parent-abc' });
    const { app } = makeApp();
    await app.request('/api/workflows/runs/run-loop-1/approve', {
      method: 'POST', body: JSON.stringify({ comment: 'proceed' }),
      headers: { 'Content-Type': 'application/json' },
    });
    expect(mockHandleMessage).toHaveBeenCalledWith(expect.anything(), 'web-parent-abc', 'proceed', expect.anything());
  });

  test('falls back to conversation_id when parent_conversation_id is null', async () => {
    mockGetWorkflowRun.mockResolvedValueOnce({ ...MOCK_LOOP_RUN, parent_conversation_id: null });
    mockGetConversationById.mockResolvedValueOnce({ id: 'worker-conv-uuid', platform_conversation_id: 'web-worker-abc' });
    const { app } = makeApp();
    await app.request('/api/workflows/runs/run-loop-1/approve', {
      method: 'POST', body: JSON.stringify({ comment: 'go' }),
      headers: { 'Content-Type': 'application/json' },
    });
    expect(mockGetConversationById).toHaveBeenCalledWith('worker-conv-uuid');
  });
});

3. Standard Approval Status Transition Not Explicitly Asserted

📍 packages/server/src/routes/api.ts:1883-1892

The PR moved status: 'failed' + metadata reset inside the approval.type !== 'interactive_loop' guard. Existing tests check node_completed events but never assert updateWorkflowRun was called with status: 'failed' + correct metadata clear. A regression here would be invisible.

Add to existing standard-approval describe block:

test('transitions standard approval run to failed status with cleared rejection metadata', async () => {
  mockGetWorkflowRun.mockResolvedValueOnce(MOCK_PAUSED_RUN);
  const { app } = makeApp();
  await app.request('/api/workflows/runs/run-paused-1/approve', {
    method: 'POST', body: JSON.stringify({ comment: 'LGTM' }),
    headers: { 'Content-Type': 'application/json' },
  });
  expect(mockUpdateWorkflowRun).toHaveBeenCalledWith('run-paused-1', {
    status: 'failed',
    metadata: { approval_response: 'approved', rejection_reason: '', rejection_count: 0 },
  });
});

🟡 Medium Issues (Consider Fixing)

4. void dispatchToOrchestrator(...) Drops Rejections Silently

📍 packages/server/src/routes/api.ts:1906

Fire-and-forget is correct here, but without .catch(), any rejection from the outer dispatch call is silently swallowed. One-line fix (already included in Issue 1's recommended code above).


5. Stale Docs — "Send a Follow-up Message to Resume"

📍 packages/docs-web/src/content/docs/guides/approval-nodes.md lines 58-60
📍 packages/docs-web/src/content/docs/guides/authoring-workflows.md line 981

Both files describe /workflow approve as requiring a follow-up message to resume. After this PR, the Web UI auto-resumes for interactive_loop gates — no follow-up needed.

approval-nodes.md fix: Replace "The explicit /workflow approve command records the approval; send a follow-up message to resume." with: "The explicit /workflow approve slash command records the approval and also auto-resumes on the Web UI; on other platforms it requires a follow-up message to trigger resume."

authoring-workflows.md fix: Replace "(records approval; send a follow-up message to resume)" with "(auto-resumes on Web UI; on other platforms, send a follow-up message after approving)"


🟢 Low Issues

Issue Location Suggestion
Fragile positional-index test assertion callArgs[10] orchestrator-agent.test.ts:1103-1105 Add expect(callArgs).toHaveLength(11) before index check
c.req.json().catch(() => ({})) silently loses parse errors (pre-existing) api.ts:1860 Add getLog().warn(...) in catch; elevated risk now that comment is injected into $LOOP_USER_INPUT
approval-nodes.md Design Notes incomplete for interactive loop path Lines 225-229 Add: interactive loop keeps paused status and uses natural-language resume path

✅ What's Good

  • Core fix is correct and minimalconversation.id as parentConversationId is exactly right, clearly commented
  • Branching is cleanif (approval.type !== 'interactive_loop') early return keeps standard path untouched
  • archon-gsd.yaml — Well-structured 1216-line default workflow; clear phase comments, parallel nodes with explicit depends_on, sensible max_iterations caps
  • E2E fixtures — Minimal and purposeful scaffolding; exactly what's needed
  • Outer try/catch — Structured logging with runId + err, consistent with the rest of the file
  • Mock infrastructure for missing tests already exists — no new mocks needed to add the suggested tests

📋 Suggested Follow-up Issue

"Refactor executeWorkflow to use options object instead of positional params" (P3) — would make the parentConversationId positional test assertion non-fragile.


Reviewed by Archon comprehensive-pr-review workflow
Full artifacts: ~/.archon/workspaces/coleam00/Archon/artifacts/runs/478f22b2bad36a0fdb3e5e352db9e16e/review/

…_loop approve tests

- Return 500 (not silent success) when the parent conversation cannot
  be resolved during interactive_loop approval, per Fail Fast principle
- Add .catch() to void dispatchToOrchestrator call to log dispatch errors
- Add length guard to positional-index test assertion in orchestrator test
- Add tests for the interactive_loop approve branch (status stays paused,
  loop_user_input stored, dispatch to parent conv, null-parent fallback,
  unresolvable conv returns 500) and assert updateWorkflowRun for standard
  approval path
- Update docs: auto-resume on Web UI, stale "send a follow-up message"
  language removed; add Design Notes clarifying interactive loop path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coleam00
Copy link
Copy Markdown
Owner Author

Fix Report: PR #1259

Date: 2026-04-16T00:30:00Z
Status: COMPLETE
Commit: 2c0002f
Philosophy: Aggressive fix — lean towards fixing everything


Summary

All HIGH, MEDIUM, and LOW issues from the consolidated review were addressed. The silent-success bug on unresolvable parent conversation is now a 500 error with an actionable message. The void fire-and-forget dispatch call has a .catch() error logger. Four new interactive_loop approve tests and one standard-approval assertion test were added. Two stale docs files and the Design Notes section were updated to reflect the new auto-resume behavior.


Fixes Applied

Severity Finding Location What Was Done
HIGH Silent success when parent conversation cannot be resolved api.ts:1903-1911 Inverted guard: if parent conv is missing, log error and return apiError(c, 500, ...) with actionable message
MEDIUM void dispatchToOrchestrator drops rejection silently api.ts:1906 Added .catch(err => getLog().error(...)) to log dispatch-level failures
HIGH interactive_loop approve branch has zero unit test coverage api.workflow-runs.test.ts Added 4 tests: status stays paused + loop_user_input stored, dispatch to parent conv, null-parent fallback, unresolvable conv returns 500
HIGH Standard approval path updateWorkflowRun not asserted api.workflow-runs.test.ts Added test asserting status: 'failed' with cleared rejection metadata
LOW Fragile positional-index test assertion (callArgs[10]) orchestrator-agent.test.ts:1103 Added expect(callArgs).toHaveLength(11) guard before the index assertion
HIGH (docs) Stale "send a follow-up message" in approval-nodes.md approval-nodes.md:56-60 Updated step 4 to reflect Web UI auto-resume; clarified other platforms still need follow-up
MEDIUM (docs) Stale "send a follow-up message" in authoring-workflows.md authoring-workflows.md:981 Updated parenthetical to reflect auto-resume on Web UI
LOW (docs) Design Notes incomplete for interactive loop path approval-nodes.md:225-229 Added paragraph explaining interactive loop keeps paused status and uses natural-language resume path

Tests Added

File Test Cases
api.workflow-runs.test.ts keeps run status paused and stores loop_user_input in metadata
api.workflow-runs.test.ts dispatches to parent conversation when parent_conversation_id is set
api.workflow-runs.test.ts falls back to conversation_id when parent_conversation_id is null
api.workflow-runs.test.ts returns 500 when parent conversation cannot be resolved
api.workflow-runs.test.ts transitions standard approval run to failed status with cleared rejection metadata

Skipped

c.req.json().catch(() => ({})) silent parse failure (pre-existing pattern across many routes; out of scope for this PR).


Validation

Check Status
Type check ✅ all packages passed
Lint ✅ 0 warnings
Tests ✅ all pass

…ject literal

The surrounding block comment already explains why status stays 'paused' for
the interactive loop path; the duplicated inner comment adds no new information.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coleam00
Copy link
Copy Markdown
Owner Author

Archon PR Validation Report

Verdict: APPROVE

Summary

All three cascading bugs (missing parentConversationId, incorrect failed status for interactive loops, no auto-resume dispatch) were confirmed on main and verified fixed on the feature branch via code review and E2E testing. The fix is minimal (2 source files), correct, well-tested (6 new tests), and introduces no regressions.

Bug Confirmation

Claim Main Feature
parentConversationId not passed BUG REPRODUCED (null) FIXED (set to conv DB ID)
Approve sets status: 'failed' for interactive loops BUG REPRODUCED (Failed count +1) FIXED (keeps paused, standard approval correctly uses failed)
No auto-resume after approve BUG REPRODUCED (conversation frozen) FIXED (workflow resumes and completes)

Issues

No blocking issues found.

Non-blocking: Pre-existing bug — resumable-run path (orchestrator-agent.ts:276-284) also omits parentConversationId. Appropriate as a follow-up.


Validated by archon-validate-pr workflow

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 20, 2026

@coleam00 this PR looks similar to #1129, which was closed on April 12, 2026 (closed without merging). You may want to read the discussion there before pushing further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Interactive workflows cannot resume after approval gate (parent_conversation_id is NULL)

2 participants