fix: detect completion signal in any XML tag, not just <promise> (#1126) by Wirasm · Pull Request #1184 · coleam00/Archon

Wirasm · 2026-04-13T12:43:19Z

Summary

Problem: Loop nodes with until: fail when the AI wraps the completion signal in any XML tag other than <promise> (e.g. <COMPLETE>ALL_CLEAN</COMPLETE>), causing max_iterations_reached even though the signal was present
Why it matters: Any workflow where the prompt instructs the AI to output a tagged signal with a custom tag (a natural pattern) would silently fail the loop and skip all downstream nodes
What changed: detectCompletionSignal now matches <anyTag>SIGNAL</anyTag> in addition to <promise>SIGNAL</promise> and plain-text patterns; stripCompletionTags accepts an optional until param to strip matched XML tags from user-visible output
What did not change: The recommended <promise> format, plain-text signal detection, max_iterations_reached error message, and all other executor logic

UX Journey

Before

User workflow prompt: "When done, output <COMPLETE>ALL_CLEAN</COMPLETE>"

  Workflow                dag-executor              detectCompletionSignal
  ────────                ────────────              ──────────────────────
  loop node runs ──────▶  AI outputs                checks <promise>... ✗
  (max_iterations: 3)     <COMPLETE>ALL_CLEAN        checks endPattern... ✗ (</COMPLETE> follows)
                          </COMPLETE>                checks ownLinePattern... ✗
                                                     returns false
                          completionDetected = false
                          (loop exhausts iterations)
  ❌ max_iterations_reached
  downstream nodes skipped

After

User workflow prompt: "When done, output <COMPLETE>ALL_CLEAN</COMPLETE>"

  Workflow                dag-executor              detectCompletionSignal
  ────────                ────────────              ──────────────────────
  loop node runs ──────▶  AI outputs                checks <promise>... ✗
  (max_iterations: 3)     <COMPLETE>ALL_CLEAN        [NEW] checks xmlWrappedPattern... ✓
                          </COMPLETE>                returns true
                          completionDetected = true
                          [strips <COMPLETE>ALL_CLEAN</COMPLETE> from output]
  ✅ loop exits completed
  downstream nodes run

Architecture Diagram

Before

dag-executor.ts
  └── detectCompletionSignal(output, signal)   [executor-shared.ts]
        ├── checks <promise>SIGNAL</promise>
        └── checks plain signal (end / own-line)

  └── stripCompletionTags(content)             [executor-shared.ts]
        └── strips <promise>...</promise>

After

dag-executor.ts
  └── detectCompletionSignal(output, signal)   [executor-shared.ts]
        ├── checks <promise>SIGNAL</promise>
        ├── [NEW] checks <anyTag>SIGNAL</anyTag>   ← additive
        └── checks plain signal (end / own-line)

  └── stripCompletionTags(content, until?)     [executor-shared.ts]
        ├── strips <promise>...</promise>
        └── [NEW] strips <anyTag>SIGNAL</anyTag> when until is provided

Connection inventory:

From	To	Status	Notes
`dag-executor.ts`	`detectCompletionSignal`	unchanged	same call, same args
`dag-executor.ts`	`stripCompletionTags`	modified	now passes `loop.until` as second arg
`executor-shared.ts`	`detectCompletionSignal` impl	modified	added xmlWrappedPattern branch
`executor-shared.ts`	`stripCompletionTags` impl	modified	added optional `until` param

Label Snapshot

Risk: risk: low
Size: size: XS
Scope: workflows
Module: workflows:executor

Change Metadata

Change type: bug
Primary scope: workflows

Linked Issue

Closes Loop node reports max_iterations_reached despite completion signal being present in output #1126

Validation Evidence (required)

bun run validate

Type check: ✅ No errors across all 10 packages
Lint: ✅ 0 errors, 0 warnings (--max-warnings 0)
Format: ✅ All files formatted
Tests: ✅ All tests passed — @archon/workflows: 397 passed (154 in dag-executor, 45 in executor-shared)
No commands skipped

Security Impact (required)

New permissions/capabilities? No
New external network calls? No
Secrets/tokens handling changed? No
File system access scope changed? No

The added regex pattern matches only within already-captured AI output strings; no new attack surface.

Compatibility / Migration

Backward compatible? Yes — the change is purely additive; all existing detection patterns remain unchanged
Config/env changes? No
Database migration needed? No

The stripCompletionTags signature change is backward compatible: until is optional with default undefined, so all existing call sites compile and behave identically.

Human Verification (required)

Verified scenarios:
- <COMPLETE>ALL_CLEAN</COMPLETE> detected and signal-stripped from output
- <promise>COMPLETE</promise> format still works
- Plain bare signal at end of output still works
- Signal embedded in prose ("not COMPLETE yet") still not detected (false-positive guard)
- Wrong value in XML tag (<COMPLETE>WRONG</COMPLETE> for signal ALL_CLEAN) not detected
Edge cases checked: self-closing tags (<COMPLETE/>) do not match (no content)
What was not verified: live end-to-end with a real Claude session (unit + integration coverage is sufficient for this targeted change)

Side Effects / Blast Radius (required)

Affected subsystems/workflows: Loop nodes with until: completion signals only
Potential unintended effects: Low-risk false positive if the AI outputs <code>SIGNAL_VALUE</code> in a code block and SIGNAL_VALUE happens to match the until: signal. This is a pre-existing risk with plain detection as well, and is mitigated by using distinctive signal values.
Guardrails: Existing test suite covers the false-positive prose case; no new monitoring needed

Rollback Plan (required)

Fast rollback: revert the two changed lines in executor-shared.ts and one changed line in dag-executor.ts
Feature flags: none
Observable failure symptoms: Loop nodes would again report max_iterations_reached when the AI uses non-<promise> XML tags; no data loss or state corruption

Risks and Mitigations

Risk: Regex false-positive — AI outputs <tag>SIGNAL</tag> in a code example inside its response
- Mitigation: Signal values should be distinctive (e.g. ALL_CLEAN, DONE). This is consistent with existing guidance and the pre-existing plain-text detection risk.

Summary by CodeRabbit

Bug Fixes
- Improved loop workflow execution to correctly handle XML-wrapped completion signals and context-aware completion tag stripping, ensuring proper workflow termination and clean output display.
Tests
- Added comprehensive test coverage for completion signal detection and tag stripping across multiple signal formats and edge cases.

Loop nodes with `until:` reported max_iterations_reached when the AI wrapped the completion signal in XML tags other than `<promise>` (e.g., `<COMPLETE>ALL_CLEAN</COMPLETE>`). The three existing regex patterns all missed this format, causing the loop to exhaust iterations and fail. Changes: - Add generic XML-wrapped signal pattern to `detectCompletionSignal` - Extend `stripCompletionTags` to strip matched XML-wrapped signals from output - Pass `loop.until` to `stripCompletionTags` call site in dag-executor - Add unit tests for detection and stripping of XML-wrapped signals - Add integration test for loop completing on final iteration with XML tags Fixes #1126

coderabbitai · 2026-04-13T12:43:28Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b57a8aa3-b154-49b1-9ece-d6d23e1fd7ee

📥 Commits

Reviewing files that changed from the base of the PR and between bf20063 and 07327f3.

📒 Files selected for processing (4)

packages/workflows/src/dag-executor.test.ts
packages/workflows/src/dag-executor.ts
packages/workflows/src/executor-shared.test.ts
packages/workflows/src/executor-shared.ts

📝 Walkthrough

Walkthrough

This PR enhances completion signal detection in loop nodes to recognize XML-wrapped formats (e.g., <COMPLETE>ALL_CLEAN</COMPLETE>) alongside existing formats, and ensures user-visible output has completion tags stripped based on the loop's termination condition. Core changes span test coverage, executor logic, and shared utility functions to fix the issue where loops incorrectly report failure despite successful completion.

Changes

Cohort / File(s)	Summary
Loop Executor `packages/workflows/src/dag-executor.ts`	Modified `executeLoopNode` to pass `loop.until` context to `stripCompletionTags`, enabling loop-aware tag removal for user-facing output while preserving full output for signal detection.
Completion Signal Utilities `packages/workflows/src/executor-shared.ts`	Updated `detectCompletionSignal` to recognize XML-wrapped completion signals with matching tag names (backreference validation) before plain-text detection; expanded `stripCompletionTags` signature to accept optional `until` parameter for conditional removal of XML-wrapped completion tags.
Test Suites `packages/workflows/src/dag-executor.test.ts`, `packages/workflows/src/executor-shared.test.ts`	Added test case for XML-wrapped loop completion signal; added comprehensive test suites covering `detectCompletionSignal` and `stripCompletionTags` across multiple signal formats (`<promise>`, arbitrary XML wrappers, plain text) and edge cases (mismatched tags, embedded values, tag-less scenarios).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(web): loop node iteration visibility in workflow execution view #1026: Modifies executeLoopNode in packages/workflows/src/dag-executor.ts, directly impacting the same execution path and loop-completion logic.

Poem

🐰 A loop that circles round and round,
Now spotting signals safe and sound!
<COMPLETE> wrapped in tags so neat,
The rabbit's fix makes logic sweet! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch archon/task-fix-issue-1126

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Wirasm · 2026-04-13T12:48:38Z

🔍 Comprehensive PR Review

PR: #1184 — fix: detect completion signal in any XML tag, not just <promise>
Reviewed by: 4 specialized agents (code-review, error-handling, test-coverage, comment-quality)
Date: 2026-04-13

Summary

Minimal, well-scoped fix (3 source lines changed). The implementation is correct: escapeRegExp prevents regex injection, the optional until parameter is fully backward-compatible, and the main code paths have solid test coverage. All agents voted APPROVE.

Verdict: ✅ APPROVE

Severity	Count
🔴 CRITICAL	0
🟠 HIGH	0
🟡 MEDIUM	2
🟢 LOW	7

🟡 Medium Issues (Quick Fixes)

1. JSDoc for `detectCompletionSignal` still says "two formats"

📍 packages/workflows/src/executor-shared.ts:370-379

The JSDoc documents only 2 detection formats; the PR adds a 3rd. A future developer won't know about the XML-wrapping fallback from the doc comment alone.

View fix

/**
 * Detect whether the AI output contains a completion signal.
 *
 * Supports three formats, checked in order:
 * 1. <promise>SIGNAL</promise> - Recommended; prevents false positives in prose
 * 2. <anytag>SIGNAL</anytag> - Any XML-wrapped tag; case-insensitive on tag names
 * 3. Plain SIGNAL - Backwards compatibility; only at end of output or on own line
 *
 * Plain signal detection is restrictive to prevent false positives like "not SIGNAL yet".
 */

2. `stripCompletionTags` not tested when both tag types appear in same chunk

📍 packages/workflows/src/executor-shared.test.ts

The function now has two strip paths (<promise> and XML-tagged). If a future refactor reorders them, tags would silently leak into user output — and no test would catch it.

View fix

it('strips both <promise> and XML-tagged signal when until is provided', () => {
  const input = 'Done. <promise>ALL_CLEAN</promise> <COMPLETE>ALL_CLEAN</COMPLETE>';
  expect(stripCompletionTags(input, 'ALL_CLEAN')).toBe('Done.');
});

🟢 Low Issues

View 7 low-priority suggestions

Issue	Location	Suggestion
Superfluous `m` flag on `xmlWrappedPattern` — no anchors, no effect	`executor-shared.ts:390`	Change `'im'` → `'i'` to match strip path's `'gi'`
Tag mismatch: `<foo>SIGNAL</bar>` triggers detection (intentional but undocumented)	`executor-shared.ts:390`	Add comment: `// Note: opening and closing tag names are not required to match`
Per-chunk strip vs full-output detect: split XML tag could produce tag fragments in stream	`dag-executor.ts:1597`	Pre-existing `<promise>` behavior; accept and document
Tag-mismatch permissive behavior not pinned in tests	`executor-shared.test.ts`	Add `it('detects signal in mismatched XML tags (permissive)', ...)`
DAG integration test doesn't assert user-visible output is clean	`dag-executor.test.ts:2930`	Assert `platform.sendMessage` calls contain no `<COMPLETE>`
`xmlWrappedPattern` comment doesn't note tag-name independence	`executor-shared.ts:390`	Append note to existing comment
`stripCompletionTags` JSDoc doesn't mention `until` param	`executor-shared.ts:403`	Append clause or add `@param until`

✅ What's Good

escapeRegExp applied to user-supplied signal in both detect and strip — correct defense against regex injection from workflow YAML
until? is fully backward-compatible; all existing call sites unaffected
Tests cover the most important negative scenarios: wrong value in tags, signal embedded in prose (guarding the primary false-positive risk that motivated the XML format)
DAG integration test verifies full execution path: completeWorkflowRun called once, failWorkflowRun never called
Detection and stripping use identical underlying regex — no drift between what is detected and what is removed
Fix is precisely scoped — no cleanup, no refactoring, no speculative features

Reviewed by Archon comprehensive-pr-review workflow

- Update detectCompletionSignal JSDoc to document all three detection formats - Update stripCompletionTags JSDoc to mention the `until` parameter - Remove superfluous `m` flag from xmlWrappedPattern (no anchors, no effect) - Document that XML tag names are matched independently (intentional permissiveness) - Add test: detects signal in mismatched XML tags (permissive behavior) - Add test: strips both <promise> and XML-tagged signal in same chunk - Add assertion in DAG integration test that raw XML tags don't appear in sent messages

Wirasm · 2026-04-13T12:51:55Z

⚡ Self-Fix Report (Aggressive)

Status: COMPLETE
Pushed: ✅ Changes pushed to archon/task-fix-issue-1126
Commit: 9bc5d8f
Philosophy: Fix everything unless clearly a new concern

Fixes Applied (7 total)

Severity	Count
🔴 CRITICAL	0
🟠 HIGH	0
🟡 MEDIUM	2
🟢 LOW	5

View all fixes

✅ JSDoc for detectCompletionSignal documents only two formats (executor-shared.ts:370-379) — Updated to list all three detection formats with descriptions
✅ stripCompletionTags not tested for combined tag types (executor-shared.test.ts) — Added test: strips both <promise> and XML-tagged signal in same chunk
✅ Superfluous m flag on xmlWrappedPattern (executor-shared.ts:390) — Changed 'im' → 'i' (no anchors, flag was a no-op)
✅ Tag mismatch not documented in regex comment (executor-shared.ts:388-390) — Added inline note that tag names are matched independently by design
✅ stripCompletionTags JSDoc doesn't mention until (executor-shared.ts:403) — Appended clause describing the until parameter
✅ Tag-mismatch permissive behavior not pinned in tests (executor-shared.test.ts) — Added test: detects signal in mismatched XML tags (permissive)
✅ DAG integration test doesn't assert output is clean (dag-executor.test.ts:2979) — Added assertions that sendMessage calls contain no raw <COMPLETE> tags

Tests Added

executor-shared.test.ts: detects signal in mismatched XML tags (permissive)
executor-shared.test.ts: strips both <promise> and XML-tagged signal when until is provided
dag-executor.test.ts: output-cleanliness assertions on existing XML-wrapped signal test

Skipped (1)

Finding	Reason
Per-chunk strip vs full-output detect: split XML tag could leak fragments (`dag-executor.ts:1597`)	Pre-existing architectural pattern for `<promise>` stripping; fixing requires non-trivial streaming refactor out of scope for this PR

Validation

✅ Type check | ✅ Lint | ✅ Tests (47 passed executor-shared, 154 passed dag-executor)

Self-fix by Archon · aggressive mode · fixes pushed to archon/task-fix-issue-1126

Follow-up to the initial broadening in this PR. The first version of the regex accepted mismatched open/close tags (e.g. `<COMPLETE>X</done>`) which was a small false-positive surface when the AI interleaves tags in prose. Tightens both detectCompletionSignal and stripCompletionTags to capture the tag name and enforce it on the close via \1 backreference. Case-insensitivity on the tag name is preserved. Test updates: - Flip the "permissive mismatch" case to assert strict rejection with a comment explaining the guard. - Add a case-insensitive matching case to lock that behavior in. No behavior change for workflows that use matching tags (the overwhelming common case) or for <promise>...</promise>. Behavior change is limited to the narrow "open tag and close tag disagree" case, which only happens when the AI is confused — in which case we'd rather report max_iterations_reached and let the author inspect than silently call the loop complete.

@stefans71

* fix(workflows): fail loudly on SDK isError results (coleam00#1208) (coleam00#1291) Previously, `dag-executor` only failed nodes/iterations when the SDK returned an `error_max_budget_usd` result. Every other `isError: true` subtype — including `error_during_execution` — was silently `break`ed out of the stream with whatever partial output had accumulated, letting failed runs masquerade as successful ones with empty output. This is the most likely explanation for the "5-second crash" symptom in coleam00#1208: iterations finish instantly with empty text, the loop keeps going, and only the `claude.result_is_error` log tips the user off. Changes: - Capture the SDK's `errors: string[]` detail on result messages (previously discarded) and surface it through `MessageChunk.errors`. - Log `errors`, `stopReason` alongside `errorSubtype` in `claude.result_is_error` so users can see what actually failed. - Throw from both the general node path and the loop iteration path on any `isError: true` result, including the subtype and SDK errors detail in the thrown message. Note: this does not implement auto-retry. See PR comments on coleam00#1121 and the analysis on coleam00#1208 — a retry-with-fresh-session approach for loop iterations is not obviously correct until we see what `error_during_execution` actually carries in the reporter's env. This change is the observability + fail-loud step that has to come first so that signal is no longer silent. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 4c6ddd9) * fix(db): throw on corrupt commands JSON instead of silent empty fallback (coleam00#1033) * fix(db): throw on corrupt commands JSON instead of silent empty fallback (coleam00#967) getCodebaseCommands() silently returned {} when the commands column contained corrupt JSON. Callers had no way to distinguish 'no commands' from 'unreadable data', violating fail-fast principles. Now throws a descriptive error with the codebase ID and a recovery hint. The error is still logged for observability before throwing. Adds two test cases: corrupt JSON throws, valid JSON string parses. * fix: include parse error in log for better diagnostics (cherry picked from commit 39a05b7) * fix(isolation): raise worktree git-operation timeout to 5m (coleam00#1306) All 15 worktree git-subprocess timeouts in WorktreeProvider were hardcoded at 30000ms. Repos with heavy post-checkout hooks (lint, dependency install, submodule init) routinely exceed that budget and fail worktree creation. Consolidate them onto a single GIT_OPERATION_TIMEOUT_MS constant at 5 min. Generous enough to cover reported cases while still catching genuine hangs (credential prompts in non-TTY, stalled fetches). Chosen over the config-key approach in coleam00#1029 to avoid adding permanent .archon/config.yaml surface for a problem a raised default solves cleanly. If 5 min turns out to also be too tight for real-world use, we'll revisit. Closes coleam00#1119 Supersedes coleam00#1029 Co-authored-by: Shay Elmualem <12733941+norbinsh@users.noreply.github.com> (cherry picked from commit cc78071) * fix(web,server): show real platform connection status in Settings (coleam00#1061) The Settings page's Platform Connections section hardcoded every platform except Web to 'Not configured', so users couldn't tell whether their Slack/ Telegram/Discord/GitHub/Gitea/GitLab adapters had actually started. - Server: /api/health now returns an activePlatforms array populated live as each adapter's start() resolves. Passed into registerApiRoutes so the reference stays mutable — Telegram starts after the HTTP listener is already accepting requests, so a snapshot would miss it. - Web: SettingsPage.PlatformConnectionsSection now reads activePlatforms from /api/health and looks each platform up in a Set. Also adds Gitea and GitLab to the list (they already ship as adapters). Closes coleam00#1031 Co-authored-by: Lior Franko <liorfr@dreamgroup.com> (cherry picked from commit 08de8ee) * fix: initialize options.hooks before merging YAML node hooks (coleam00#1177) When a workflow node defines hooks (PreToolUse/PostToolUse) in YAML but no hooks exist yet on the options object, applyNodeConfig crashes with "undefined is not an object" because it tries to assign properties on the undefined options.hooks. Initialize options.hooks to {} before the merge loop. Reproduces with: archon workflow run archon-architect (which uses per-node hooks extensively). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit 7ea3214) * fix: detect completion signal in any XML tag, not just <promise> (coleam00#1126) (coleam00#1184) * fix: detect completion signal in any XML tag, not just <promise> (coleam00#1126) Loop nodes with `until:` reported max_iterations_reached when the AI wrapped the completion signal in XML tags other than `<promise>` (e.g., `<COMPLETE>ALL_CLEAN</COMPLETE>`). The three existing regex patterns all missed this format, causing the loop to exhaust iterations and fail. Changes: - Add generic XML-wrapped signal pattern to `detectCompletionSignal` - Extend `stripCompletionTags` to strip matched XML-wrapped signals from output - Pass `loop.until` to `stripCompletionTags` call site in dag-executor - Add unit tests for detection and stripping of XML-wrapped signals - Add integration test for loop completing on final iteration with XML tags Fixes coleam00#1126 * fix: address review findings for completion signal detection - Update detectCompletionSignal JSDoc to document all three detection formats - Update stripCompletionTags JSDoc to mention the `until` parameter - Remove superfluous `m` flag from xmlWrappedPattern (no anchors, no effect) - Document that XML tag names are matched independently (intentional permissiveness) - Add test: detects signal in mismatched XML tags (permissive behavior) - Add test: strips both <promise> and XML-tagged signal in same chunk - Add assertion in DAG integration test that raw XML tags don't appear in sent messages * simplify: reduce complexity in changed files * fix: require matching XML tag names in completion-signal detection Follow-up to the initial broadening in this PR. The first version of the regex accepted mismatched open/close tags (e.g. `<COMPLETE>X</done>`) which was a small false-positive surface when the AI interleaves tags in prose. Tightens both detectCompletionSignal and stripCompletionTags to capture the tag name and enforce it on the close via \1 backreference. Case-insensitivity on the tag name is preserved. Test updates: - Flip the "permissive mismatch" case to assert strict rejection with a comment explaining the guard. - Add a case-insensitive matching case to lock that behavior in. No behavior change for workflows that use matching tags (the overwhelming common case) or for <promise>...</promise>. Behavior change is limited to the narrow "open tag and close tag disagree" case, which only happens when the AI is confused — in which case we'd rather report max_iterations_reached and let the author inspect than silently call the loop complete. (cherry picked from commit bc25dee) * fix(web): allow deleting nodes from Workflow Builder (coleam00#971) (coleam00#1113) * fix(web): allow deleting nodes from Workflow Builder (coleam00#971) Three independent gaps prevented users from deleting nodes added to the Workflow Builder canvas: dropped nodes were never auto-selected so keyboard shortcuts silently no-oped, no right-click context menu existed, and the Delete Node button was buried in the Advanced tab (hidden below the viewport for Prompt/Command, completely absent for Bash since bash nodes have no Advanced tab). Fixes coleam00#971. * fix(web): push undo snapshot before adding nodes on canvas Call onPushSnapshot() before setNodes() in both onDrop and quick-add handlers so that node additions are captured by undo/redo history. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(web): address PR coleam00#1113 review feedback - Hold nodes/edges in refs so handleNodeDeleteById and onPushSnapshot can't capture stale pre-drop state (fixes undo-stack correctness). - Clamp context-menu x/y to viewport so right-click near edges stays fully on-screen. - Drop non-conformant role=menu/menuitem from the single-item context menu; rely on the native button for accessibility. - Extend isInputTarget() to cover ARIA combobox/textbox/searchbox so Backspace in Radix/shadcn widgets never nukes a node. - Extract handleBuilderKeydown as a pure function and add tests covering the Delete/Backspace + isInputTarget invariant. - Remove issue-number references from code comments per CLAUDE.md. - Document the new delete affordances in the Workflow Builder docs. - Inline context-menu dismissal, rename pointer handler, drop unused deps in keyboardActions useMemo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit d7f36b2) * fix(workflows): make archon-adversarial-dev sed replacement macOS-safe (coleam00#1155) * fix(workflows): make adversarial init sed portable on macOS * chore: regenerate bundled-defaults after adversarial-dev sed fix Sync generated bundle with the new temp-file sed pattern in archon-adversarial-dev.yaml so check:bundled passes and binary distributions ship the macOS-safe version. --------- Co-authored-by: laplace young <yangqk12@whu.edu.cn> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com> (cherry picked from commit 817186d) * fix(deps): override transitive axios to ^1.15.0 for CVE-2025-62718 (coleam00#1330) axios <1.15.0 can be coerced to bypass NO_PROXY rules via hostname normalization, enabling SSRF in the right network shape. Archon pulls axios transitively through @slack/bolt (^1.12.0) and @slack/web-api (^1.13.5); before this change bun.lock resolved axios@1.13.6 — within the vulnerable range. Adding "axios": "^1.15.0" to the root package.json overrides bumps the transitive resolution to axios@1.15.1 (latest compatible 1.x). Both Slack range specs accept it without API surface changes — no downstream code touches axios directly. Supersedes coleam00#1153. Credits @stefans71 for identifying and reporting the vulnerability; their PR was stale on the lockfile (0.3.5 → 0.3.6 drift on dev), so this is a fresh one-line re-do on current dev. Closes coleam00#1053. Co-authored-by: Stefans71 <stefans71@users.noreply.github.com> (cherry picked from commit ae2d936) * fix(cli): surface stale-workspace registration error instead of fake "not a git repo" (coleam00#1332) * fix(cli): surface stale-workspace registration error instead of fake "not a git repo" When workflowRunCommand auto-registers an unregistered repo, a stale ~/.archon/workspaces/<owner>/<repo>/source symlink (pointing to an old checkout) causes createProjectSourceSymlink() in @archon/paths to throw: Source symlink at <linkPath> already points to <existing>, expected <target> The CLI caught that in a try/catch, logged it at warn level, continued with `codebase = null`, and then the isolation / resume branches hit their "codebase missing" fallback and threw the generic: Cannot create worktree: not in a git repository. That message is false — the repo is valid; the Archon workspace entry is stale. It sends users down the wrong diagnostic path (checking git config, permissions, etc.) instead of pointing at the workspace dir. Fix: preserve the registration error on a new `codebaseRegistrationError` local, and at both fallback sites (resume + worktree-creation) check it before the generic "not a git repo" branch. When set, throw a truthful: Cannot {create worktree,resume}: repository registration failed. Error: <original message> Hint: Remove the stale workspace entry at <dir> and retry, or use --no-worktree to skip isolation. The hint's exact path comes from a small parser that extracts the workspace directory from the known "Source symlink at …" format; when the message shape doesn't match (future error text changes), the parser returns null and we fall back to a generic "check registration under <archon-home>/workspaces" hint — safe degradation. Regression test in workflow.test.ts asserts the new error message and negatively asserts the old "not in a git repository" string is gone. Supersedes coleam00#1157 — that PR was draft + CONFLICTING against current dev, and also mentioned Windows test-compat changes that weren't in the diff (pruned scope). This is a fresh re-do focused strictly on coleam00#1146. Closes coleam00#1146. Co-authored-by: Bortlesboat <Bortlesboat@users.noreply.github.com> * review: add resume-path test, null-fallback test, update troubleshooting docs Addresses multi-agent review feedback on this PR: - Add regression test for the --resume fallback site (the worktree-create site was already covered; the resume site had identical wiring but zero test coverage). - Add test for the unrecognized-error-shape branch of buildRegistrationFailureError so the generic workspace hint is pinned (prevents accidental inversion of the stale-entry vs generic-hint ternary). - Update the troubleshooting page to key on the new "Cannot create worktree: repository registration failed." message. Users hitting the new error won't find the page under the old heading, and the "In the future..." note is obsolete now that the error itself contains the cleanup path. - Trim both new docblocks: keep the load-bearing cross-package error string contract in extractStaleWorkspaceEntry, drop narration of what the code already shows. Drop the "Before this helper existed..." paragraph from buildRegistrationFailureError — that's CHANGELOG material. Drop PR-reference suffix from the test section divider. * review: guard getArchonHome in hint + export parser for direct tests Two follow-up fixes to the multi-agent review commit (f32f002): CodeRabbit finding — unguarded getArchonHome() in the fallback hint. If getArchonHome() ever throws (misconfigured env vars, permission issues on the resolution path), the registration-failure Error would never get constructed: we'd throw a secondary home-resolution error that masks the root cause. Wrap the fallback branch in try/catch — prefer losing the exact path in the hint over replacing the actionable registration error. A safe generic hint ("Check your Archon workspace registration and retry") takes over when getArchonHome() throws. The original error.message is always embedded verbatim in the re-thrown Error. S2 — export extractStaleWorkspaceEntry for direct table tests. The parser is where the cross-package string contract with @archon/paths actually lives; direct tests against it are cheaper than end-to-end CLI tests and pin the edge cases: - POSIX path with forward slashes (typical unix user) - Windows path with backslashes (verifies Math.max(lastIndexOf / , lastIndexOf \)) - Unrelated error message (no prefix) → null - Prefix matches but delimiter missing → null - Source path without any separator → null (guards against returning empty string, which would produce a nonsense "Remove the stale workspace entry at " hint) - Empty string → null Six new cases in the test file. The claim of Windows support in the PR description is now actually verified. * fix(test): make generic-hint assertion path-separator agnostic Windows test runner (CI) hit: Expected to contain: "Check your Archon workspace registration under /home/test/.archon/workspaces" Received: "... under \home\test\.archon\workspaces and retry, ..." path.join normalizes to `\` on Windows and `/` on POSIX. The test hardcoded forward slashes in the expected substring. Split into two separator-agnostic asserts: the prefix up to "under", then `/workspaces\b/` regex for the final path segment. Behavior doesn't change — the hint still gets the full path.join'd workspaces dir on either platform. --------- Co-authored-by: Bortlesboat <Bortlesboat@users.noreply.github.com> (cherry picked from commit 056707d) * fix(server,web,workflows): web approval gates auto-resume + reject-with-reason dialog (coleam00#1329) * fix(server,web,workflows): web approval gates auto-resume + reject-with-reason dialog Fixes three tightly-coupled bugs that made web approval gates unusable: 1. orchestrator-agent did not pass parentConversationId to executeWorkflow for any web-dispatched foreground / interactive / resumable run. Without that field, findResumableRunByParentConversation (the machinery the CLI relies on for resume) couldn't find the paused run from the same conversation on a follow-up message, and the approve/reject API handlers had no conversation to dispatch back to. 2. POST /api/workflows/runs/:runId/{approve,reject} recorded the decision and returned "Send a message to continue the workflow." — the workflow never actually resumed. Added tryAutoResumeAfterGate() that mirrors what workflowApproveCommand / workflowRejectCommand already do on the CLI: look up the parent conversation, dispatch `/workflow run <name> <userMessage>` back through dispatchToOrchestrator. Failures are non-fatal — the user can still send a manual message as a fallback. 3. The during-streaming cancel-check in dag-executor aborted any streaming node whenever the run status left 'running', including the legitimate transition to 'paused' that an approval node performs. A concurrent AI node in the same DAG layer now tolerates 'paused' and finishes its own stream; only truly terminal / unknown states (null, cancelled, failed, completed) abort the in-flight stream. Web UI: ConfirmRunActionDialog gains an optional reasonInput prop (label + placeholder) that renders a textarea and passes the trimmed value to onConfirm. WorkflowRunCard (dashboard) and WorkflowProgressCard (chat) both use it for Reject now — the chat card was still on window.confirm, which was both inconsistent with the dashboard and couldn't collect a reason. The trimmed reason threads through to $REJECTION_REASON in the workflow's on_reject prompt. Supersedes coleam00#1147. @jonasvanderhaegen surfaced the root cause and shape of the fix; that PR was 87 commits stale and pre-dated the reject-UX upgrade (coleam00#1261 area), so this is a fresh re-do on current dev. Tests: - packages/server/src/routes/api.workflow-runs.test.ts — 5 new cases: approve with parent dispatches; approve without parent returns "Send a message"; approve with deleted parent conversation skips safely; reject dispatches on-reject flows; reject that cancels (no on_reject) does NOT dispatch. - packages/core/src/orchestrator/orchestrator.test.ts — updated the two synthesizedPrompt-dispatch tests for the new executeWorkflow arity. Closes coleam00#1131. Co-authored-by: Jonas Vanderhaegen <7755555+jonasvanderhaegen@users.noreply.github.com> * fix: address multi-agent review findings for web approval auto-resume C1 (critical) — cross-adapter misrouting guard tryAutoResumeAfterGate now checks parentConv.platform_type === 'web' before dispatching. Non-web parents (Slack/Telegram/GitHub/Discord) being approved from the dashboard skip auto-resume rather than dispatching a Slack thread_ts or Telegram chat_id through the web adapter's lock manager. C2 (critical) — fire-and-forget dispatch replaced with await void dispatchToOrchestrator() meant the "Resuming workflow." response fired before async work completed, and the outer try/catch couldn't observe dispatch failures. Changed to await; response now accurately reflects dispatch outcome. I1 — replaced logPrefix string-template (which produced 3-segment api.workflow_*.dispatched event names violating {domain}.{action}_{state}) with literal event names per action, branched inside the helper. Accepts action: 'approve' | 'reject' instead. I2 — corrected misleading "foreground/interactive" qualifier in the approve-endpoint comment; background web dispatches also set parent_conversation_id via the pre-created run, so they auto-resume too. I3 — extracted shouldContinueStreamingForStatus() as a small exported policy and added 7 unit tests covering running/paused/null/cancelled/ failed/completed/unknown. Full-integration coverage of the paused- tolerance invariant would require manipulating the 10s CANCEL_CHECK_INTERVAL_MS, which is flaky-prone; unit test of the policy function captures the same invariant deterministically. I4 — updated approval-nodes.md and authoring-workflows.md to reflect that Web UI approve/reject now auto-resumes (no "send a follow-up message" copy), documented the reject-with-reason dialog and $REJECTION_REASON flow, and called out the cross-platform caveat. S1 — rewrote streaming status check as positive shouldContinue safe-list via the extracted policy function, matching the inline comment. S2 — inlined handleReject on the dashboard rather than squeezing rejectWorkflowRun through runAction with a closure; keeps runAction narrow for the single-arg lifecycle actions. S5 — new regression test covering the non-web-parent skip path (slack-platform parent → dispatch skipped → response falls back to "Send a message to continue"). S6 — removed stale reference to runAction in ConfirmRunActionDialog's onConfirm JSDoc (no longer accurate now that WorkflowProgressCard calls the dialog without runAction). S7 — fixed misleading "user can resume manually by sending any message" docstring (resume is triggered by re-running the workflow command, not by an arbitrary message). Skipped as out-of-scope: S3 — cancelWorkflowRun rowCount check (pre-existing defect; separate PR) S4 — tightening expect.anything() to UUID regex (deferred) S8 — 12-positional-arg executeWorkflow → options-bag refactor (tracked follow-up) bun run validate green locally; 68 tests in api.workflow-runs.test.ts (up from 67), 173 in dag-executor.test.ts (up from 166). * review: close I1/I2/I3/I4/I6 — paused tolerance in loop + emitter, resume test, useId I1 (loop inter-iteration check) — dag-executor.ts:1715 Used `!== 'running'` in the loop node's between-iteration status check. A sibling approval node pausing the run in the same topological layer would abort the loop mid-iteration with "Loop node '<id>' stopped at iteration N (paused)". Switched to the shared shouldContinueStreamingForStatus helper so paused is tolerated — same semantics the streaming check got. Extended inline comment explains the sibling-layer concurrency reason. I2 (skipIfStatusChanged emitter unregister) — dag-executor.ts:2886 At DAG-finalization writes the helper correctly skipped writing on any non-running state (paused included — don't mark a paused run complete), but it *also* called getWorkflowEventEmitter().unregisterRun() which broke SSE observability for a run that's still live (waiting for user approval). Split the two responsibilities: skip the write for all non-running states, but only unregister the emitter for terminal states (cancelled / deleted / completed / failed). `paused` keeps the emitter registered so resume stays visible on the dashboard. I3 (foreground_resume_detected branch untested) — orchestrator-agent.test.ts That branch was modified as part of the original fix (added parentConversationId as 11th positional arg) but no existing test configured mockFindResumableRunByParentConversation to return non-null. A positional mistake (e.g. accidentally swapping issueContext and parentConversationId) would silently break auto-resume with no failing test. New regression test configures the mock, asserts both the cwd comes from the resumable run's working_path AND parentConversationId is passed correctly at position 10. I4 (null-parent log level) — api.ts tryAutoResumeAfterGate `getConversationById` returning null is a data-integrity signal (the parent conversation was deleted while the run was paused) — worth surfacing at info level so operators notice, not hiding at debug. Missing platform_conversation_id on an existing row would be an unusual DB state and stays at debug. Added `parentDeleted: boolean` to the log context so the two cases are distinguishable in observability. I6 (hardcoded DOM id) — ConfirmRunActionDialog.tsx `id="confirm-run-action-reason"` collided when multiple dialog instances share the same page (Radix portals mitigate in practice but the code was fragile). Switched to React.useId() so each instance gets a unique id — htmlFor/id wiring preserved. S11 (arity-only assertion) — orchestrator-agent.test.ts:1092 area The interactive-workflow-on-web test asserted mockExecuteWorkflow was called, but nothing about the args. Added a specific assertion that position 10 (parentConversationId) equals 'conv-1' (the caller conversation id) — pins the wiring that I1/I2 depend on being correct. Deferred (from review S1-S10, I5, I7): - S1 (ExecuteWorkflowOptions bag) — tracked as standalone follow-up; 12 positional args with 2 adjacent optionals is a real maintenance hazard but the refactor deserves its own PR. - S7 (WHY comment on non-web else branch) — review text says the branch "correctly omits" parentConversationId but the code passes it; the combination with the web-parent guard in tryAutoResumeAfterGate is intentional. Not adding a justify-what-we-don't-do comment. - S2/S3/S4/S5/S8/S9/S10 — pure polish (event-map ternary, platformConvId inlining, shared constant for REJECTION_REASON_INPUT, onChange arrow shorthand, discriminated union, docblock trim, suffix comment drop) - I5 (soften "Resuming workflow." to "— check the dashboard for progress") — users clicking from the dashboard are already on the dashboard; the current text is accurate (enqueue completed) and concise. - I7 (test dispatch-throws path) — covered implicitly by the try/catch branch of tryAutoResumeAfterGate returning false; a direct test would require mocking handleMessage to throw and would couple to dispatchToOrchestrator internals. bun run validate green; 189 dag-executor tests, 98 orchestrator-agent tests, 68 api.workflow-runs tests — all the new cases pass. --------- Co-authored-by: Jonas Vanderhaegen <7755555+jonasvanderhaegen@users.noreply.github.com> (cherry picked from commit d5c1cd9) * feat(providers): autodetect canonical binary install paths for Claude and Codex (coleam00#1361) Both binary resolvers previously stopped at env-var + explicit config and threw a "not found" error when neither was set. Users who followed the upstream-recommended install flow (Anthropic's `curl install.sh` for Claude, `npm install -g @openai/codex`) still had to manually set either `CLAUDE_BIN_PATH` / `CODEX_BIN_PATH` or the corresponding config field before any workflow could run. Add a tier-N autodetect step between the explicit config tier and the install-instructions throw. Purely additive: env and config still win when set (precedence covered by new tests). On autodetect miss, the same install-instructions error fires as before. Claude probe list (verified against docs.claude.com "Uninstall Claude Code → Native installation" section): - $HOME/.local/bin/claude (mac/linux native installer) - $USERPROFILE\.local\bin\claude.exe (Windows native installer) Codex probe list (verified against openai/codex README; npm global- install puts the binary at `{npm_prefix}/bin/<name>` on POSIX, `{npm_prefix}\<name>.cmd` on Windows): - $HOME/.npm-global/bin/codex (user-set `npm config set prefix`) - /opt/homebrew/bin/codex (mac arm64 with homebrew-node) - /usr/local/bin/codex (mac intel / linux system node) - %APPDATA%\npm\codex.cmd (Windows npm global default) - $HOME\.npm-global\codex.cmd (Windows user-set prefix) Not probed (explicit override still required): - Custom npm prefixes — `npm root -g` would need a subprocess per resolve, too much surface for a probe helper - `brew install --cask codex` — cask layout isn't a PATH binary - Manual GitHub Releases extracts — placement is user-determined - `~/.bun/bin/codex` — not documented in openai/codex README Pi provider intentionally has no equivalent change: the Pi SDK is bundled into the archon binary (no subprocess), so there's no "binary" to resolve. Pi auth lives at `~/.pi/agent/auth.json` which the SDK already finds by default, and the PR A shim (`PI_PACKAGE_DIR`) handles the package-dir case via Pi's own documented escape hatch. E2E verified: removed both config entries from ~/.archon/config.yaml, rebuilt compiled binary, ran `archon workflow run archon-assist` and a Codex workflow. Logs showed `source: 'autodetect'` for both, responses returned cleanly. (cherry picked from commit b99cee4) * fix(providers/test): use os.homedir() instead of $HOME in claude binary autodetect test The native-installer autodetect test computed its expected path from process.env.HOME, but the implementation uses node:os homedir(). On Windows, HOME is typically unset (Windows uses USERPROFILE), so the test fell back to '/Users/test' while the resolver returned the real home dir — making the spy's path-equality check fail and breaking CI on windows-latest. Mirror the implementation by importing homedir() from node:os and joining with node:path so the expected path matches the actual platform-resolved home and separator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit f9f8775) * fix(server): contain Discord login failure so it doesn't kill the server (coleam00#1365) Reported in coleam00#1365: a user running `archon serve` with DISCORD_BOT_TOKEN set but the "Message Content Intent" toggle disabled in the Discord Developer Portal saw the entire server crash with `Used disallowed intents`. Discord rejects the gateway connection (close code 4014) when a privileged intent is requested without being enabled, and the unguarded `await discord.start()` propagated the error all the way up, taking the web UI down with it. Wrap discord.start() in try/catch — log the failure with an actionable hint (special-cased for the disallowed-intent error) and continue running. Other adapters and the web UI come up regardless. The shutdown handler already uses optional chaining (`discord?.stop()`) so nulling discord after a failed start is safe. Other adapters (Telegram, Slack, GitHub, Gitea, GitLab) have the same unguarded-start pattern but are out of scope for this fix — addressing them is tracked separately. Also expanded the Discord setup docs with a caution callout that names the exact error string and the new log event so users can grep for both. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 5957c6e) --------- Co-authored-by: Cole Medin <cole@dynamous.ai> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Kagura <kagura.chen28@gmail.com> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Shay Elmualem <12733941+norbinsh@users.noreply.github.com> Co-authored-by: Lior Franko <lior.franko@ironsrc.com> Co-authored-by: Lior Franko <liorfr@dreamgroup.com> Co-authored-by: Alex Siri <alexsiri7@gmail.com> Co-authored-by: Ahmed <44034059+medevs@users.noreply.github.com> Co-authored-by: CauchYoung <2024302072042@whu.edu.cn> Co-authored-by: laplace young <yangqk12@whu.edu.cn> Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com> Co-authored-by: Stefans71 <stefans71@users.noreply.github.com> Co-authored-by: Bortlesboat <Bortlesboat@users.noreply.github.com> Co-authored-by: Jonas Vanderhaegen <7755555+jonasvanderhaegen@users.noreply.github.com>

simplify: reduce complexity in changed files

21f1c2b

Wirasm mentioned this pull request Apr 13, 2026

Loop node reports max_iterations_reached despite completion signal being present in output #1126

Closed

Wirasm mentioned this pull request Apr 21, 2026

Rethink loop until: primitive — introduce completion_tool: as preferred path #1333

Open

Wirasm marked this pull request as ready for review April 22, 2026 05:47

Wirasm merged commit bc25dee into dev Apr 22, 2026
4 checks passed

prospapledge88 mentioned this pull request Apr 24, 2026

chore: cherry-pick Tier 1-2 upstream fixes (14 commits) prospapledge88/Archon#6

Merged

6 tasks

Wirasm deleted the archon/task-fix-issue-1126 branch April 27, 2026 13:07

This was referenced Apr 27, 2026

feat(workflows): expose $LOOP_PREV_OUTPUT in loop node prompts (#1286) #1367

Merged

fix(workflows): distinguish until_bash system errors from condition-false (#1241) #1247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: detect completion signal in any XML tag, not just <promise> (#1126)#1184

fix: detect completion signal in any XML tag, not just <promise> (#1126)#1184
Wirasm merged 4 commits intodevfrom
archon/task-fix-issue-1126

Wirasm commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Wirasm commented Apr 13, 2026

Uh oh!

Wirasm commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wirasm commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Wirasm commented Apr 13, 2026

🔍 Comprehensive PR Review

Summary

🟡 Medium Issues (Quick Fixes)

1. JSDoc for detectCompletionSignal still says "two formats"

2. stripCompletionTags not tested when both tag types appear in same chunk

🟢 Low Issues

✅ What's Good

Uh oh!

Wirasm commented Apr 13, 2026

⚡ Self-Fix Report (Aggressive)

Fixes Applied (7 total)

Tests Added

Skipped (1)

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Wirasm commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

1. JSDoc for `detectCompletionSignal` still says "two formats"

2. `stripCompletionTags` not tested when both tag types appear in same chunk