Skip to content

feat(workflows): add repo-triage — periodic maintenance via inline Haiku sub-agents#1293

Merged
Wirasm merged 3 commits intodevfrom
feat/repo-triage-workflow
Apr 20, 2026
Merged

feat(workflows): add repo-triage — periodic maintenance via inline Haiku sub-agents#1293
Wirasm merged 3 commits intodevfrom
feat/repo-triage-workflow

Conversation

@Wirasm
Copy link
Copy Markdown
Collaborator

@Wirasm Wirasm commented Apr 19, 2026

Summary

  • Problem: issue/PR triage is a recurring maintenance tax — labeling, dedup detection, stale follow-ups, cross-refs between PRs and issues. Each of those is simple on its own but tedious to do manually across an active repo (80+ open issues, 50+ open PRs). They also benefit from persistent memory so we don't re-flag the same thing twice.
  • Why it matters: a periodic self-contained workflow folds all of that into one invocation, uses Haiku fan-out for the expensive map steps (briefing per-item), and a stronger model for the reduce (clustering, verifying "fully-addresses" claims, synthesising a digest). This is the first real exercise of the inline agents: field from feat(workflows): inline sub-agent definitions on DAG nodes #1276 — proves the map-reduce pattern works end-to-end.
  • What changed: adds .archon/workflows/repo-triage.yaml — a 6-node DAG workflow. Adds .archon/state/ to .gitignore for the cross-run memory files the workflow writes.
  • What did not change (scope boundary): zero code changes — workflow YAML only. No new engine features, no schema changes, no dependencies. Runs entirely on the agents: support merged in feat(workflows): inline sub-agent definitions on DAG nodes #1276.

UX Journey

Before

```
maintainer manual triage pass
────────── ────────────────────
open GitHub ────────────▶ eyeball 80 issues ──▶ apply labels one-by-one
spot obvious duplicates
(miss subtle ones)
nudge stale items (rarely)
notice a PR "Closes #X"
was never written
```

After

```
scheduler (future) repo-triage workflow maintainer
────────────── ───────────────────── ──────────
cron / cmdtrigger ───────▶ archon workflow run ──────────────▶ reads digest.md,
repo-triage clicks links to
│ specific bot
├─ Haiku fan-out briefs all comments
│ 80 issues + 56 PRs in parallel
├─ Sonnet reduces:
│ labels (delegated to triage-agent)
│ dedup clusters (open↔open)
│ closed-issue matches
│ closed-PR duplicates
│ stale-nudges (60d inactivity)
│ PR template-fill nudges
├─ idempotent GitHub comments
│ @-tagged to the right human
└─ digest.md with clickable
links to every comment
```

Architecture Diagram

Before

```
(no workflow existed — maintenance was manual)
```

After

```
DAG LAYERS
─────────────────────────────────────────────────────────────────

Layer 1 (parallel, no deps):
┌────────────────────┐ ┌────────────────────┐
│ triage-issues │ │ link-prs │
│ │ │ │
│ inline brief-gen │ │ inline pr-issue- │
│ (Haiku) │ │ matcher (Haiku) │
│ │ │ │
│ + delegates to │ │ + Sonnet verifies │
│ on-disk │ │ "fully-addr." │
│ triage-agent │ │ + first-run │
│ │ │ grandfather │
│ │ │ guard for │
│ │ │ template nudges │
│ writes state ═══▶ │ │ │
└────────────────────┘ └────────────────────┘

┌────────────────────┐ ┌────────────────────┐
│ closed-pr-dedup- │ │ stale-nudge │
│ check │ │ │
│ │ │ no Task fan-out │
│ inline pr-brief- │ │ (direct Sonnet │
│ gen (Haiku) │ │ work) │
│ │ │ │
│ comment only, │ │ 60-day inactivity │
│ never closes PRs │ │ pings; never │
│ │ │ closes │
└────────────────────┘ └────────────────────┘

Layer 2 (depends_on: triage-issues):
┌──────────────────────────────────────────┐
│ closed-dedup-check │
│ │
│ reads triage-state.json open briefs ◀══╗│
│ inline closed-brief-gen (Haiku) │
│ 3-day auto-close clock on matches │
└──────────────────────────────────────────┘

Layer 3 (depends_on: all five above):
┌──────────────────────────────────────────┐
│ digest │
│ │
│ reads $.output + all 5 state │
│ files; resolves repo slug via gh; │
│ writes $ARTIFACTS_DIR/digest.md with │
│ headline numbers, per-node summaries, │
│ comment-URL index (clickable), carry- │
│ forward items still on 3-day clock │
└──────────────────────────────────────────┘
```

Connection inventory:

From To Status Notes
workflow file .archon/workflows/ new Discovered at runtime by workflow-discovery
inline agents Claude SDK options.agents uses #1276 brief-gen / pr-issue-matcher / pr-brief-gen / closed-brief-gen
on-disk .claude/agents/triage-agent.md via Task from triage-issues unchanged Reused as label-classification sub-agent
state files .archon/state/*.json new Cross-run memory; gitignored
digest $ARTIFACTS_DIR/digest.md new Per-run artifact with comment URLs

Label Snapshot

  • Risk: risk: low (workflow YAML only; no engine changes; DRY_RUN + SKIP_* flags bound blast radius)
  • Size: size: M
  • Scope: workflows
  • Module: workflows:defaults (user-level workflow)

Change Metadata

  • Change type: feature
  • Primary scope: workflows

Linked Issue

Validation Evidence (required)

```bash
bun run cli validate workflows repo-triage # passes — 6 nodes, schema-valid
bun run validate # full suite green (no code changes)
```

Live run evidence (on dev with #1276 merged):

Security Impact (required)

  • New permissions/capabilities? No code permissions changed. The workflow invokes gh from the operator's shell — inherits the operator's existing GitHub auth. Required scopes: issue/PR read + comment + label edit + close.
  • New external network calls? Only via gh CLI (github.com). No new endpoints beyond what manual triage would hit.
  • Secrets/tokens handling changed? No — no secrets in the workflow file. gh auth lives in the operator's keychain already.
  • File system access scope changed? Reads/writes .archon/state/*.json (gitignored). Reads .github/pull_request_template.md and .github/ISSUE_TEMPLATE/*.md if present. Writes to $ARTIFACTS_DIR/*.

Compatibility / Migration

  • Backward compatible? Yes — strictly additive. Workflow is opt-in (only runs when invoked).
  • Config/env changes? No required envs. Optional knobs: DRY_RUN, SKIP_PR_LINK, SKIP_CLOSED_DEDUP, SKIP_CLOSED_PR_DEDUP, SKIP_STALE_NUDGE, STALE_DAYS.
  • Database migration needed? No.

Human Verification (required)

Verified scenarios:

  • DRY_RUN=1 full run: no mutations, digest generated with (DRY RUN) markers, per-node summaries readable
  • Live run: exact comment counts match digest's Comment index; state files written atomically per node; no false auto-closes (3-day clock starts fresh on first run)
  • Grandfather guard: 38 would-be template nudges snapshotted on baseline run; state records them as already-nudged so future runs only nudge new low-fill PRs
  • Idempotency: rerunning the live workflow does NOT re-post the same comments — skip-if-exists check on comment body works
  • Comment IDs captured: every bot comment ID stored in the appropriate state file (botCommentId or nested commentIds)

Edge cases checked:

  • Empty state files on first run → parse as default shape, no errors
  • closed-dedup-check with missing triage-state.json (dry-run case) → gracefully reports "skipped, no open briefs"
  • Template files absent from repo → workflow detects empty pr-template.md / issue-templates.md and returns no-template-context in briefs (no false nudges)
  • Reserved dag-node-skills ID collision via inline agents: warn log + platform message (from feat(workflows): inline sub-agent definitions on DAG nodes #1276)
  • Labels already present on every open issue → label pass skips all 80 cleanly

What was not verified:

  • Behavior across repo forks / multiple remotes (single-repo workflow by design)
  • Scheduler integration (no scheduler exists in Archon yet; one is planned)
  • Very large repos (>200 open issues or >100 open PRs) — may hit gh pagination; uses --limit 200 / --limit 100 explicitly

Side Effects / Blast Radius (required)

  • Affected subsystems: GitHub issues + PRs of the repo the workflow runs in. Writes under .archon/state/ and .archon/artifacts/runs/<runId>/.
  • Potential unintended effects:
    • "Related to #X" cross-refs can be noisy (11 pairs = 22 comments in the live run). Conservative clustering rules in each node prompt reduce but don't eliminate false positives.
    • The 3-day auto-close clock will fire on subsequent runs if reporters don't reply. Reporters are explicitly told this in the comment.
    • Template nudges @-mention the contributor, which is a loud notification. Grandfather guard ensures this only happens for contributors who opened NEW low-fill PRs after the baseline snapshot.
  • Guardrails/monitoring:
    • DRY_RUN=1 for safe preview
    • Per-node SKIP_*=1 flags for staged rollout (used SKIP_PR_LINK once, never needed again)
    • Digest emits an "Auto-closed this run" count maintainers can audit
    • All comments @-tag the target; easy to search: in:comments author:<bot> Archon

Rollback Plan (required)

  • Fast rollback path:
    • If workflow misbehaves, rm .archon/workflows/repo-triage.yaml locally and don't invoke it again.
    • To revert real-world side effects, bot comments can be deleted manually (state files track IDs for traceability — .archon/state/*.json has every botCommentId).
    • gh api --method DELETE /repos/<owner>/<repo>/issues/comments/<id> per stored ID.
  • Feature flags: SKIP_*=1 env vars disable individual nodes instantly.
  • Observable failure symptoms: dag-executor logs dag.unsupported_capabilities if a future provider change strips inline-agents support; comments stop posting + state stops updating + digest marks affected nodes as (output unavailable).

Risks and Mitigations

  • Risk: Sonnet over-clusters and flags real distinct issues as duplicates, eroding maintainer trust.
    • Mitigation: every prompt explicitly biases toward false-negative ("when in doubt, skip — this workflow re-runs"). Live run on 80 issues produced exactly 1 cluster — verified conservative by inspection.
  • Risk: 3-day auto-close closes legitimately-open issues when the reporter is slow to reply.
    • Mitigation: (1) the initial comment gives the reporter an explicit 3-day window; (2) non-bot reply after postedAt removes the entry without closing; (3) closed issues can be reopened trivially; (4) close reason is set to not_planned so history is clear.
  • Risk: Related-cross-ref comments are noisy on active repos.
    • Mitigation: idempotent (won't re-post); matcher conservative; maintainers can delete unwanted ones manually; digest surfaces counts so overcommenting is observable.
  • Risk: The workflow needs gh auth to be configured on the operator's machine; silent failure if not.
    • Mitigation: prompts explicitly instruct "if gh returns an auth error or rate limit, stop the run cleanly and report the error in the summary. Do NOT partially mutate state." Verified in live run.
  • Risk: Schedule isn't wired yet — workflow must be manually invoked.
    • Mitigation: noted in description; will retrofit to the future scheduler when it lands.

Summary by CodeRabbit

  • New Features

    • Added automated repository maintenance: periodic issue/PR triage, conservative duplicate detection with reporter notifications and delayed auto-close, PR↔issue linking suggestions, stale reminders, and a final digest summary (optional Slack post).
  • Chores

    • Ignored persisted workflow state to prevent committing cross-run state.
  • Documentation

    • Updated repository docs to describe the workflow directory, scripts, and gitignored state.

Adds .archon/workflows/repo-triage.yaml: a self-contained periodic
maintenance workflow that uses inline sub-agents (Claude SDK agents:
field introduced in #1276) for map-reduce across open issues and PRs.

Six DAG nodes, three-layer topology:
- Layer 1 (parallel): triage-issues, link-prs, closed-pr-dedup-check,
  stale-nudge
- Layer 2: closed-dedup-check (reads triage-issues state)
- Layer 3: digest (synthesises all prior nodes + writes markdown)

Capabilities per node:
- triage-issues: delegates labeling to on-disk triage-agent; inline
  brief-gen Haiku for duplicate detection; 3-day auto-close clock
  for unanswered duplicate warnings
- link-prs: conservative PR ↔ issue cross-refs via inline pr-issue-
  matcher Haiku, Sonnet re-verifies fully-addresses claims before
  suggesting Closes #X; auto-nudges on low-quality PR template fill
  with first-run grandfather guard (snapshot-only, no nudge spam)
- closed-dedup-check: cross-matches open issues against recently-
  closed ones via inline closed-brief-gen Haiku; same 3-day clock
- closed-pr-dedup-check: flags open PRs duplicating recently-closed
  PRs via inline pr-brief-gen Haiku; comment-only, never closes PRs
- stale-nudge: 60-day inactivity pings (configurable); no auto-close
- digest: synthesises per-node outputs + reads state files to emit
  $ARTIFACTS_DIR/digest.md with clickable GitHub comment links

Env-gated rollout knobs:
- DRY_RUN=1 (read-only; prints [DRY] lines, no gh/state mutations)
- SKIP_PR_LINK=1, SKIP_CLOSED_DEDUP=1, SKIP_CLOSED_PR_DEDUP=1,
  SKIP_STALE_NUDGE=1
- STALE_DAYS=N (stale-nudge window; default 60)

Cross-run state under .archon/state/ (gitignored):
- triage-state.json        briefs + pendingDedupComments
- closed-dedup-state.json  closedBriefs + closedMatchComments
- closed-pr-dedup-state.json openBriefs + closedBriefs + matches
- pr-state.json            linkedPrs + commentIds + templateAdherence
- stale-nudge-state.json   nudged (with updatedAtAtNudge for re-nudge)

Every bot comment:
- @-tags the target human (reporter for issues, author for PRs)
- Tracks comment ID in state for traceability
- Is idempotent — re-runs skip existing comments

Intended use: invoke periodically (`archon workflow run repo-triage
--no-worktree`) once a scheduler lands; live state persists across
runs so previously-flagged items reconcile correctly.

.gitignore: adds .archon/state/ for cross-run memory files.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 712170c1-6229-4a0f-a86b-bae1e1489067

📥 Commits

Reviewing files that changed from the base of the PR and between ce3c745 and 882414e.

📒 Files selected for processing (3)
  • .archon/workflows/repo-triage.yaml
  • CLAUDE.md
  • packages/docs-web/src/content/docs/reference/archon-directories.md

📝 Walkthrough

Walkthrough

Adds a new non-interactive GitHub Actions workflow .archon/workflows/repo-triage.yaml implementing six idempotent maintenance nodes that persist per-node JSON state under .archon/state/. Also updates ignore/docs to exclude and document the new state directory.

Changes

Cohort / File(s) Summary
Workflow
.archon/workflows/repo-triage.yaml
Introduces repo-triage workflow with six nodes: triage-issues, closed-dedup-check, closed-pr-dedup-check, link-prs, stale-nudge, and digest. Each node reads/writes per-node JSON state in .archon/state/, supports DRY_RUN, uses sequential/idempotent comment posting, and has guarded auto-close behavior where applicable.
State ignore & docs
.gitignore, CLAUDE.md, packages/docs-web/src/content/docs/reference/archon-directories.md
Adds .archon/state/ to .gitignore and documents new .archon/state/ plus .archon/scripts/ in repository docs; states that .archon/state/ is cross-run workflow state and should not be committed.

Sequence Diagram(s)

sequenceDiagram
    participant Cron as Scheduler
    participant GH as GitHub Actions
    participant API as GitHub API
    participant Agent as Triage agents / brief-gen
    participant State as .archon/state (persist)
    participant Art as Artifacts
    participant Slack as Slack (optional)

    Cron->>GH: trigger repo-triage
    GH->>Agent: run node (triage-issues / link-prs / stale-nudge / closed-* / digest)
    Agent->>API: fetch issues, PRs, templates, diffs
    Agent->>State: read/write per-node JSON state (atomic merges)
    Agent->>API: post labels/comments/cross-references (sequential, idempotent)
    Note over Agent,State: 3-day windows and at-most-once actions tracked in state
    GH->>Art: write `.digest.md` and per-run artifacts
    GH->>Slack: optional best-effort summary via webhook
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hop through issues with a gentle cheer,
I brief, I nudge, keep the backlog clear.
State tucked safe in .archon/state/ below,
I link and close with a tidy little glow.
Hooray — the repo dreams while I watch and sew.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: addition of a repo-triage workflow that uses inline Haiku sub-agents for periodic maintenance tasks.
Description check ✅ Passed The description is comprehensive and covers all major template sections with substantial detail: problem/motivation, UX journey (before/after), architecture diagram with connection inventory, labels, change metadata, linked issues, extensive validation evidence with live-run metrics, security impact analysis, compatibility notes, human verification of scenarios and edge cases, side effects/blast radius, rollback plan, and risks/mitigations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/repo-triage-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.archon/workflows/repo-triage.yaml:
- Around line 826-828: The GH PR list invocation that writes to
"$ARTIFACTS_DIR/prs.json" is missing the isDraft field, so the draft-skip
guardrail cannot operate; update the gh pr list --json argument (the call that
currently requests number,title,body,headRefName,author,updatedAt) to also
include isDraft so the orchestrator can read draft status before applying the
draft skip logic that references prs.json.
- Around line 217-222: When auto-closing duplicates (the branch that checks "now
- postedAt >= 3 days" and posts the "Auto-closing: no reply within 3 days..."
comment then runs `gh issue close <N> --reason not_planned`), capture the GitHub
comment ID returned when posting the closing comment and persist it to a per-run
artifact/state bucket (e.g., an array like `runPostedCommentIds` or under
`digest.postedComments`) before removing the entry from the tracked
`state`/`pending` store; ensure the same change is applied to the other
auto-close paths (lines referenced 442-448 and 1176-1220) so the digest can
include a direct GitHub URL for every bot comment posted this run.
- Around line 1176-1185: The workflow currently uses a gh call ("gh repo view
--json nameWithOwner --jq .nameWithOwner") inside the digest step while
guardrails forbid gh calls; fix by either (A) removing the gh call and reading
the repo slug from a previously persisted value (pass the slug as an
input/artifact/state from an earlier job/node and reference that instead in the
digest step), or (B) explicitly allow a single read-only gh lookup by updating
the workflow guardrails to permit that command and keep the existing gh
invocation; ensure the chosen approach is applied consistently for the other
occurrences noted (the same digest / SLUG use around the later block).
- Around line 122-127: The triage workflow steps triage-issues and link-prs are
both writing to the shared artifact "$ARTIFACTS_DIR/issues.json" (produced by
the gh issue list call), causing concurrent clobbering and schema mismatch;
change the artifact output to a node-specific filename (e.g., include the
job/step name or matrix value) so each job writes its own file (replace
"$ARTIFACTS_DIR/issues.json" with a unique name like
"$ARTIFACTS_DIR/issues-triage-issues.json" or "$ARTIFACTS_DIR/issues-${{
github.job }}.json" for both the gh issue list invocation and any consumers) and
update downstream consumers to read the corresponding node-specific filename.
- Around line 1075-1086: The stale-PR filter is applying label-based exclusions
but the gh CLI invocation (the gh pr list that writes to
"$ARTIFACTS_DIR/stale-prs.json") did not request labels, so label-based checks
like `blocked`, `keep-open`, `wontfix`, `needs-maintainer`, `pinned` are
ineffective; update the `gh pr list --json ...` call to include `labels` (e.g.,
add "labels" to the --json fields) so the produced stale-prs.json contains
labels, then ensure the filter logic that inspects labels (the skip checks for
`isDraft` and the list of do-not-bother labels) uses those labels from the JSON.
- Around line 1095-1106: The idempotency check doesn't match the workflow's own
comment prefix — update the idempotency logic (the string used in the
`Idempotency:` check) to look for the actual posted prefix used in the template
(e.g. match comments that start with "@<author> this {issue,pr} has been quiet"
or use a regex that accepts either "@<author>" or "this {issue,pr} has been
quiet"); modify the Idempotency line in repo-triage.yaml so it checks for the
correct prefix used in the "Issues:" and "PRs:" templates to ensure the workflow
skips when its own comment already exists.
- Around line 981-985: The save logic currently overwrites each
state.linkedPrs[<pr>] with only { sha, processedAt, related, fullyAddresses },
dropping fields like commentIds, templateAdherence, and templateNudgedAt; change
the write/refresh step so it merges the new values into the existing linkedPrs
entry (preserving any existing keys) before assigning and writing
state.lastRunAt and .archon/state/pr-state.json, i.e., read the existing
state.linkedPrs[pr] and produce mergedEntry = { ...existingEntry, sha,
processedAt, related, fullyAddresses } (or equivalent merge operation) and write
mergedEntry back.
- Around line 1221-1226: Add a deterministic, shared run-start timestamp before
any comment filtering by inserting an initial workflow step (e.g., a step named
set-run-start) that captures a single ISO timestamp (now) and publishes it as a
step output and environment variable (e.g.,
steps.set-run-start.outputs.run_start / RUN_START); then change all subsequent
filtering logic that currently compares postedAt/nudgedAt against "today" or
local clocks to compare against that shared run_start value (use the step output
or env) so every node/step uses the exact same cutoff when deciding which
comments belong to "this run" vs. carry-forward pending.
- Line 221: Replace the incorrect gh CLI reason token by updating occurrences of
the command string 'gh issue close <N> --reason not_planned' to use the proper
quoted reason with a space: 'gh issue close <N> --reason "not planned"'; locate
both occurrences of the exact command text in the workflow and update them so
the CLI accepts the reason value.
- Line 499: The gh CLI invocations using "gh pr view <N> --json ..." include an
unsupported field "stateReason" (present in the three occurrences of the gh pr
view command) which breaks the pr-brief-gen task; remove "stateReason" from
those --json field lists and update any logic in the pr-brief-gen task that used
stateReason so it instead derives PR status from mergedAt and closedAt (treat
mergedAt !== null as "merged", else closedAt !== null as "closed unmerged").
Ensure you only adjust the three gh pr view --json invocations and the
status-derivation code path that referenced stateReason.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e4cf7f34-e1f4-4378-9527-a131c1e39d40

📥 Commits

Reviewing files that changed from the base of the PR and between 60eeb00 and bfe812b.

📒 Files selected for processing (2)
  • .archon/workflows/repo-triage.yaml
  • .gitignore

Comment on lines +122 to +127
## 2. Fetch all open issues
```
gh issue list --state open \
--json number,title,body,author,labels,comments,createdAt,updatedAt \
--limit 200 > "$ARTIFACTS_DIR/issues.json"
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use node-specific artifact filenames to avoid concurrent clobbering.

triage-issues and link-prs run concurrently, but both write $ARTIFACTS_DIR/issues.json with different schemas. Either node can read the other node’s file or a partial redirect.

Proposed fix
       gh issue list --state open \
         --json number,title,body,author,labels,comments,createdAt,updatedAt \
-        --limit 200 > "$ARTIFACTS_DIR/issues.json"
+        --limit 200 > "$ARTIFACTS_DIR/triage-issues.json"
...
       gh issue list --state open \
         --json number,title,body,labels,author \
-        --limit 200 > "$ARTIFACTS_DIR/issues.json"
+        --limit 200 > "$ARTIFACTS_DIR/link-prs-issues.json"

Also applies to: 824-833

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 122 - 127, The triage
workflow steps triage-issues and link-prs are both writing to the shared
artifact "$ARTIFACTS_DIR/issues.json" (produced by the gh issue list call),
causing concurrent clobbering and schema mismatch; change the artifact output to
a node-specific filename (e.g., include the job/step name or matrix value) so
each job writes its own file (replace "$ARTIFACTS_DIR/issues.json" with a unique
name like "$ARTIFACTS_DIR/issues-triage-issues.json" or
"$ARTIFACTS_DIR/issues-${{ github.job }}.json" for both the gh issue list
invocation and any consumers) and update downstream consumers to read the
corresponding node-specific filename.

Comment thread .archon/workflows/repo-triage.yaml Outdated
Comment on lines +217 to +222
- Else if `now - postedAt >= 3 days`:
- Post closing comment on #N:
"Auto-closing: no reply within 3 days of the duplicate
check. Please reopen if this is still relevant."
- `gh issue close <N> --reason not_planned`
- Drop entry from state.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Capture auto-close comment IDs before dropping pending state.

Both auto-close paths post a closing comment and then remove the tracked entry, so the digest cannot satisfy “direct GitHub URL for every bot comment this run posted.” Store closing comment IDs in a per-run artifact/state bucket before deletion.

Also applies to: 442-448, 1176-1220

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 217 - 222, When auto-closing
duplicates (the branch that checks "now - postedAt >= 3 days" and posts the
"Auto-closing: no reply within 3 days..." comment then runs `gh issue close <N>
--reason not_planned`), capture the GitHub comment ID returned when posting the
closing comment and persist it to a per-run artifact/state bucket (e.g., an
array like `runPostedCommentIds` or under `digest.postedComments`) before
removing the entry from the tracked `state`/`pending` store; ensure the same
change is applied to the other auto-close paths (lines referenced 442-448 and
1176-1220) so the digest can include a direct GitHub URL for every bot comment
posted this run.

Comment thread .archon/workflows/repo-triage.yaml Outdated
and a `state` hint (open/closed) in the caller's prompt.

Fetch:
gh pr view <N> --json number,title,body,state,headRefName,changedFiles,additions,deletions,mergedAt,closedAt,stateReason
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/usr/bin/env bash
set -u

if ! command -v gh >/dev/null 2>&1; then
  echo "gh CLI is not installed in this environment"
  exit 0
fi

gh pr view --help | sed -n '/JSON Fields/,/FLAGS/p' | grep -E 'stateReason|mergedAt|closedAt|state' || true

Repository: coleam00/Archon

Length of output: 41


🌐 Web query:

GitHub gh pr view JSON fields stateReason documentation

💡 Result:

The gh pr view command supports JSON output via the --json flag. The official documentation lists the available fields explicitly. Available JSON fields for gh pr view (from https://cli.github.com/manual/gh_pr_view): additions, assignees, author, autoMergeRequest, baseRefName, baseRefOid, body, changedFiles, closed, closedAt, closingIssuesReferences, comments, commits, createdAt, deletions, files, fullDatabaseId, headRefName, headRefOid, headRepository, headRepositoryOwner, id, isCrossRepository, isDraft, labels, latestReviews, maintainerCanModify, mergeCommit, mergeStateStatus, mergeable, mergedAt, mergedBy, milestone, number, potentialMergeCommit, projectCards, projectItems, reactionGroups, reviewDecision, reviewRequests, reviews, state, statusCheckRollup, title, updatedAt, url. stateReason is NOT a supported field for gh pr view. Attempting gh pr view --json stateReason previously caused a panic (fixed in cli/cli#9307 merged July 2024), and it was explicitly excluded from PR commands because it is an Issue-only field in GitHub's GraphQL API. For comparison, gh issue view DOES support stateReason (along with assignees, author, body, closed, closedAt, closedByPullRequestsReferences, comments, createdAt, id, isPinned, labels, milestone, number, projectCards, projectItems, reactionGroups, state, title, updatedAt, url). To view available fields for any command, run gh pr view --json "" (or any invalid field) to see the error message listing them. Fields come from GitHub GraphQL PullRequest object but are a curated subset supported by the CLI.

Citations:


🏁 Script executed:

# First, let's verify the file exists and check the specific lines mentioned
git ls-files | grep -E "repo-triage.yaml|\.archon"

Repository: coleam00/Archon

Length of output: 3460


🏁 Script executed:

# Check the file size first
wc -l .archon/workflows/repo-triage.yaml

# Then view the specific lines mentioned
sed -n '495,510p' .archon/workflows/repo-triage.yaml

sed -n '625,635p' .archon/workflows/repo-triage.yaml

Repository: coleam00/Archon

Length of output: 1513


Remove stateReason from PR gh JSON fields at lines 499, 507, and 632.

stateReason is not a supported field for gh pr view --json (it is an issue-only field in GitHub's GraphQL API). Using this field will cause the pr-brief-gen task to fail. Derive PR status ("merged" vs "closed unmerged") from the mergedAt and closedAt fields instead.

Proposed changes
-            gh pr view <N> --json number,title,body,state,headRefName,changedFiles,additions,deletions,mergedAt,closedAt,stateReason
+            gh pr view <N> --json number,title,body,state,headRefName,changedFiles,additions,deletions,mergedAt,closedAt
-            "stateReason": "<closed reason if applicable, else null>",
+            "closeFlavour": "open | merged | closed-unmerged",
-                 closed on <date> (<stateReason>). You may want to read
+                 closed unmerged on <date>. You may want to read
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
gh pr view <N> --json number,title,body,state,headRefName,changedFiles,additions,deletions,mergedAt,closedAt,stateReason
gh pr view <N> --json number,title,body,state,headRefName,changedFiles,additions,deletions,mergedAt,closedAt
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml at line 499, The gh CLI invocations using
"gh pr view <N> --json ..." include an unsupported field "stateReason" (present
in the three occurrences of the gh pr view command) which breaks the
pr-brief-gen task; remove "stateReason" from those --json field lists and update
any logic in the pr-brief-gen task that used stateReason so it instead derives
PR status from mergedAt and closedAt (treat mergedAt !== null as "merged", else
closedAt !== null as "closed unmerged"). Ensure you only adjust the three gh pr
view --json invocations and the status-derivation code path that referenced
stateReason.

Comment on lines +826 to +828
gh pr list --state open \
--json number,title,body,headRefName,author,updatedAt \
--limit 100 > "$ARTIFACTS_DIR/prs.json"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fetch isDraft before using the draft skip.

The template nudge pass says to skip draft PRs, but the open PR fetch does not include isDraft, so the orchestrator has no fetched signal to enforce that guardrail.

Proposed fix
       gh pr list --state open \
-        --json number,title,body,headRefName,author,updatedAt \
+        --json number,title,body,headRefName,author,updatedAt,isDraft \
         --limit 100 > "$ARTIFACTS_DIR/prs.json"

Also applies to: 946-949

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 826 - 828, The GH PR list
invocation that writes to "$ARTIFACTS_DIR/prs.json" is missing the isDraft
field, so the draft-skip guardrail cannot operate; update the gh pr list --json
argument (the call that currently requests
number,title,body,headRefName,author,updatedAt) to also include isDraft so the
orchestrator can read draft status before applying the draft skip logic that
references prs.json.

Comment thread .archon/workflows/repo-triage.yaml Outdated
Comment on lines +1075 to +1086
Stale open PRs (skip drafts):
gh pr list --state open --limit 100 \
--json number,title,author,updatedAt,isDraft \
--search "updated:<${CUTOFF}" > "$ARTIFACTS_DIR/stale-prs.json"

## 3. Filter
For each item:
- Skip PRs where `isDraft == true` — drafts are often WIP, nudging
them is rude.
- Skip any item with a label matching `wontfix`, `blocked`,
`needs-maintainer`, `pinned`, `keep-open` (common "do not bother"
signals). Check current labels via the fetched JSON.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include PR labels before applying do-not-bother filters.

The stale filter applies labels to “any item,” but stale PRs are fetched without labels, so PRs labeled blocked, keep-open, etc. can still be nudged.

Proposed fix
       Stale open PRs (skip drafts):
           gh pr list --state open --limit 100 \
-            --json number,title,author,updatedAt,isDraft \
+            --json number,title,author,updatedAt,isDraft,labels \
             --search "updated:<${CUTOFF}" > "$ARTIFACTS_DIR/stale-prs.json"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 1075 - 1086, The stale-PR
filter is applying label-based exclusions but the gh CLI invocation (the gh pr
list that writes to "$ARTIFACTS_DIR/stale-prs.json") did not request labels, so
label-based checks like `blocked`, `keep-open`, `wontfix`, `needs-maintainer`,
`pinned` are ineffective; update the `gh pr list --json ...` call to include
`labels` (e.g., add "labels" to the --json fields) so the produced
stale-prs.json contains labels, then ensure the filter logic that inspects
labels (the skip checks for `isDraft` and the list of do-not-bother labels) uses
those labels from the JSON.

Comment thread .archon/workflows/repo-triage.yaml Outdated
Comment on lines +1095 to +1106
Issues:
@<author> this issue has been quiet for <N> days. Is it still
relevant? A quick update on current status would help with
triage. No reply needed if it's no longer blocking you.

PRs:
@<author> this PR has been quiet for <N> days. Is it still
active? Happy to help unblock if review feedback or a rebase
is needed — just drop a note.

Idempotency: `gh {issue,pr} view <N> --json comments` — skip if any
existing comment already starts with "this {issue,pr} has been quiet".
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Match the actual stale-nudge comment prefix during idempotency checks.

The posted bodies start with @<author>, but the idempotency check looks for comments starting with "this {issue,pr} has been quiet", so it will not match this workflow’s own comments if state is missing or reset.

Proposed fix
-      existing comment already starts with "this {issue,pr} has been quiet".
+      existing bot comment contains "this {issue,pr} has been quiet for"
+      or exactly matches the generated body after the leading `@mention`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 1095 - 1106, The idempotency
check doesn't match the workflow's own comment prefix — update the idempotency
logic (the string used in the `Idempotency:` check) to look for the actual
posted prefix used in the template (e.g. match comments that start with
"@<author> this {issue,pr} has been quiet" or use a regex that accepts either
"@<author>" or "this {issue,pr} has been quiet"); modify the Idempotency line in
repo-triage.yaml so it checks for the correct prefix used in the "Issues:" and
"PRs:" templates to ensure the workflow skips when its own comment already
exists.

Comment on lines +1176 to +1185
# Comment-URL index — REQUIRED

The digest MUST include a direct GitHub URL for every bot comment
this run posted. Build URLs from state files.

Steps:

1. Determine the repo slug:
SLUG=$(gh repo view --json nameWithOwner --jq .nameWithOwner)
(example: `coleam00/Archon`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve the digest gh call contradiction.

The digest requires gh repo view to build comment URLs, but its guardrails say “No gh calls here.” Either allow this single read-only lookup or pass the repo slug from an earlier node/artifact.

Two acceptable directions
-      - No gh calls here. No comments. No closes. Synthesis only.
+      - No mutating gh calls here. A single read-only `gh repo view` is
+        allowed only to determine the repository slug for comment URLs.

Or avoid gh entirely by having an earlier node persist the slug into state/artifacts.

Also applies to: 1333-1335

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 1176 - 1185, The workflow
currently uses a gh call ("gh repo view --json nameWithOwner --jq
.nameWithOwner") inside the digest step while guardrails forbid gh calls; fix by
either (A) removing the gh call and reading the repo slug from a previously
persisted value (pass the slug as an input/artifact/state from an earlier
job/node and reference that instead in the digest step), or (B) explicitly allow
a single read-only gh lookup by updating the workflow guardrails to permit that
command and keep the existing gh invocation; ensure the chosen approach is
applied consistently for the other occurrences noted (the same digest / SLUG use
around the later block).

Comment on lines +1221 to +1226
5. Only surface comments posted IN THIS RUN. Use the `postedAt` /
`nudgedAt` timestamps: include entries whose timestamp equals
today's run window (≥ this run's start time). Older entries
from prior runs should NOT re-appear in the "just posted" tables
but DO appear in a separate "carry-forward pending" table so
maintainers see what's still on the 3-day clock.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist a run-start timestamp before filtering “this run” comments.

The digest is told to include comments with timestamps >= this run's start time, but no node records a shared run start. Same-day prior comments can leak into “this run,” or freshly posted comments can be missed if the digest guesses.

Proposed fix
+      At workflow start, persist a shared `runStartedAt` ISO timestamp
+      (for example in `$ARTIFACTS_DIR/run-started-at.txt` or each state
+      file's per-run metadata). The digest must use that value when
+      separating "this run" from carry-forward entries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 1221 - 1226, Add a
deterministic, shared run-start timestamp before any comment filtering by
inserting an initial workflow step (e.g., a step named set-run-start) that
captures a single ISO timestamp (now) and publishes it as a step output and
environment variable (e.g., steps.set-run-start.outputs.run_start / RUN_START);
then change all subsequent filtering logic that currently compares
postedAt/nudgedAt against "today" or local clocks to compare against that shared
run_start value (use the step output or env) so every node/step uses the exact
same cutoff when deciding which comments belong to "this run" vs. carry-forward
pending.

…is set

Extends the digest node with an optional Slack-post step after the
canonical digest.md artifact is written. Uses Slack incoming webhook
(no bot token required beyond the incoming-webhook scope).

Behavior:
- SLACK_WEBHOOK unset → skipped silently with a one-line note
- DRY_RUN=1 → prints full payload, does not curl
- Otherwise → POSTs a compact (<3500 char) mrkdwn-formatted summary
  containing headline numbers, this-run comment index (clickable
  GitHub URLs), pending items, and a path reference to digest.md
- curl failure or non-ok Slack response is logged but does not fail
  the node — digest.md on disk remains authoritative
- Intermediate Slack text written to $ARTIFACTS_DIR/digest-slack.txt
  for traceability; payload JSON assembled via jq and written to
  $ARTIFACTS_DIR/slack-payload.json before curl posts it

Slack mrkdwn conversion rules baked into the prompt (no tables, link
shape <url|text>, single-asterisk bold) so Sonnet emits a variant
that renders cleanly in Slack rather than being sent raw.

The webhook URL is read from the operator's environment (Archon
auto-loads ~/.archon/.env on CLI startup — put SLACK_WEBHOOK=... there).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.archon/workflows/repo-triage.yaml (1)

1415-1417: Harden the Slack curl invocation.

Two small but worthwhile tweaks on the POST:

  • No timeout — a slow/unreachable webhook can stall the digest node indefinitely. Add --max-time (and optionally --connect-timeout).
  • Prefer --data-binary @file`` for JSON payloads; --data is safe here because `jq -c` emits a single line, but `--data-binary` is the idiomatic, byte-preserving choice and avoids surprises if the payload generation ever changes.
♻️ Proposed fix
-          curl -sS -X POST -H 'Content-Type: application/json' \
-            --data "@$ARTIFACTS_DIR/slack-payload.json" \
-            "$SLACK_WEBHOOK"
+          curl -sS --connect-timeout 5 --max-time 15 \
+            -X POST -H 'Content-Type: application/json' \
+            --data-binary "@$ARTIFACTS_DIR/slack-payload.json" \
+            "$SLACK_WEBHOOK"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml around lines 1415 - 1417, The curl POST
that sends "$SLACK_WEBHOOK" should be hardened: replace the current --data
"@$ARTIFACTS_DIR/slack-payload.json" usage with --data-binary
"@$ARTIFACTS_DIR/slack-payload.json" to preserve bytes, and add timeout flags
such as --max-time 10 (and optionally --connect-timeout 5) to avoid stalling;
update the curl invocation that references ARTIFACTS_DIR and SLACK_WEBHOOK
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.archon/workflows/repo-triage.yaml:
- Line 1337: The workflow is printing the SLACK_WEBHOOK secret; change the echo
that references SLACK_WEBHOOK to emit only whether it's set (e.g.,
"SLACK_WEBHOOK_SET=true/false") and remove any direct printing of the URL, and
update the failure logging that uses curl -sS (the curl invocation/its error
handling block) to scrub the webhook URL from any output before echoing (replace
or strip the SLACK_WEBHOOK value from messages so only a redacted token/boolean
appears).

---

Nitpick comments:
In @.archon/workflows/repo-triage.yaml:
- Around line 1415-1417: The curl POST that sends "$SLACK_WEBHOOK" should be
hardened: replace the current --data "@$ARTIFACTS_DIR/slack-payload.json" usage
with --data-binary "@$ARTIFACTS_DIR/slack-payload.json" to preserve bytes, and
add timeout flags such as --max-time 10 (and optionally --connect-timeout 5) to
avoid stalling; update the curl invocation that references ARTIFACTS_DIR and
SLACK_WEBHOOK accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9bf5947-0c29-4a4b-bef3-3d6291635ba6

📥 Commits

Reviewing files that changed from the base of the PR and between bfe812b and ce3c745.

📒 Files selected for processing (1)
  • .archon/workflows/repo-triage.yaml


AFTER writing `digest.md`, check the env:

echo "SLACK_WEBHOOK=${SLACK_WEBHOOK:-<unset>}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't echo SLACK_WEBHOOK — it's a bearer credential.

echo "SLACK_WEBHOOK=${SLACK_WEBHOOK:-<unset>}" prints the full webhook URL to stdout whenever it's set. Slack incoming webhook URLs are secrets — anyone with the URL can post to the channel — and this workflow runs outside GitHub Actions' automatic secret-masking, so the value will land in CI logs / digest artifacts verbatim. The same risk applies to the failure line at line 1429, since curl -sS will emit the target URL in network-error messages.

Emit a boolean only, and scrub URLs from any error you surface.

🔒 Proposed fix
       AFTER writing `digest.md`, check the env:

-          echo "SLACK_WEBHOOK=${SLACK_WEBHOOK:-<unset>}"
+          # Never print the webhook itself — it's a bearer credential.
+          if [ -n "${SLACK_WEBHOOK:-}" ]; then
+            echo "SLACK_WEBHOOK=<set>"
+          else
+            echo "SLACK_WEBHOOK=<unset>"
+          fi

And for the failure-log instruction around line 1429, scrub the webhook from curl output before printing, e.g.:

-          Slack post FAILED: <curl error or Slack response body>
+          Slack post FAILED: <curl error or Slack response body, with
+          the webhook URL redacted — never print $SLACK_WEBHOOK verbatim>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.archon/workflows/repo-triage.yaml at line 1337, The workflow is printing
the SLACK_WEBHOOK secret; change the echo that references SLACK_WEBHOOK to emit
only whether it's set (e.g., "SLACK_WEBHOOK_SET=true/false") and remove any
direct printing of the URL, and update the failure logging that uses curl -sS
(the curl invocation/its error handling block) to scrub the webhook URL from any
output before echoing (replace or strip the SLACK_WEBHOOK value from messages so
only a redacted token/boolean appears).

@Wirasm
Copy link
Copy Markdown
Collaborator Author

Wirasm commented Apr 20, 2026

PR Review Summary (multi-agent)

Agents run: code-reviewer, docs-impact, silent-failure-hunter, code-simplifier. Scope: .archon/workflows/repo-triage.yaml (1439 lines) + .gitignore.

Critical Issues (3) — must fix before merge

Source Issue Location
code-reviewer gh issue close --reason not_planned is wrong — the CLI accepts "not planned" (space, not underscore). Both auto-close paths will fail at runtime. repo-triage.yaml:221, :447
code-reviewer link-prs step 7 save writes a sparse { sha, processedAt, related, fullyAddresses } object, overwriting templateNudgedAt / commentIds / templateAdherence. After any sha change the grandfather guard re-triggers, template nudges get re-posted, and the digest loses comment URLs. Merge into the existing entry instead. repo-triage.yaml:981–985
silent-failure-hunter Corrupt-JSON state file silently treated as first-run default (instruction only distinguishes ENOENT / empty). Resets pendingDedupComments → the 3-day auto-close clock can restart indefinitely. Must abort on JSON parse error. :118–120, :364–365, :562–566, :819–821, :1060–1062

Important Issues (7) — fix before live runs

Source Issue Location
silent-failure-hunter "Parse each response as JSON. If parsing fails, skip that issue." — no log, no failed-list output. link-prs doesn't even surface a failed= count. An LLM-side regression or gh hiccup permanently defers issues across runs. Log failed IDs + raw response prefix. :171, :392, :601, :867, summary :988–995
silent-failure-hunter Sub-agent prompts (brief-gen, closed-brief-gen, pr-brief-gen, pr-issue-matcher) have no gh auth/rate-limit guard. Orchestrator-level guard doesn't reach them; failures get swallowed by the JSON-parse skip above. Return ERROR: sentinel from agents; orchestrator aborts. sub-agent prompts
silent-failure-hunter 3-day auto-close does now - postedAt >= 3 days with no ISO-8601 validation. A corrupt postedAt could either permanently skip the close (invisible) or immediately close (worse). Validate parse first. :211–223, :441–449
silent-failure-hunter gh issue close failure path drops the state entry anyway. Closing comment has been posted but the issue remains open and untracked — no retry possible. Only drop state on success; otherwise set closeAttemptFailed: true and retry next run. :218–222, :447–449
silent-failure-hunter closed-pr-dedup-check idempotency check fetches gh pr view for exact-body comparison. No error path — a gh hiccup lets the agent fall through and double-post. Skip comment on check-failure. :634–637
code-reviewer triage-agent referenced as on-disk sub-agent but not declared anywhere in the workflow. No engine validation catches a missing .claude/agents/triage-agent.md — runtime Task call fails with no useful error. Either inline it in agents: or add a preflight bash: file-existence check. :83, :145–154, :250
code-reviewer brief-gen template-adherence rule is inverted: "Ignore whitespace / HTML comment placeholders … as 'filled'" — naturally reads as treat them as filled, opposite of intent. Contrast with pr-issue-matcher:714 which correctly says "missing (empty / placeholder / HTML-comment-only)". :54

Suggestions (7)

Source Suggestion Location
code-reviewer + silent-failure-hunter stale-nudge idempotency check looks for comments starting with "this issue has been quiet", but posted bodies start with @<author> this issue has been quiet. Check always fails; only the state-file guard actually dedupes. Use a substring match on "has been quiet for". :1104–1108
silent-failure-hunter digest node notes missing upstream outputs as (output unavailable) in a subsection. Headline numbers still look complete. Prefix the digest with a bold WARNING block listing missing nodes when any upstream output is absent. :1434–1436
silent-failure-hunter Slack curl doesn't capture HTTP status or redirect stderr; TLS/connection errors won't be seen. Use -w "\nHTTP_STATUS:%{http_code}" + 2>&1. :1425–1430
silent-failure-hunter closed-dedup-check can't distinguish "zero open issues" from "triage-issues crashed" — both produce the same "skipping" line. Check lastRunAt != null to differentiate. :339–342
silent-failure-hunter link-prs grandfather guard relies on the model remembering state.linkedPrs was empty "at the start of this run." Capture BASELINE_IS_EMPTY as an explicit boolean at step 1 rather than re-deriving it at step 6b. :921–935
silent-failure-hunter pr-brief-gen truncates diffs at 30,000 chars silently. Add diffTruncated: true to the brief so the orchestrator can downgrade truncated-diff matches to related instead of fully-addresses. :499–500
code-simplifier ~50 lines (3.5%) of redundancy reducible without losing behavior: DRY_RUN block repeated in 5 nodes (~30 lines), state-file preamble repeated 5× (~15 lines), inconsistent SKIP guard phrasing in link-prs vs. others (~4 lines). Non-blocking. DRY_RUN blocks at :62–80, :307–323, :524–539, :762–774, :1026–1037

Documentation Impact

File Issue Fix
CLAUDE.md:562–569 Repo-level .archon/ tree doesn't list state/ — this PR establishes the convention and gitignores it. Add └── state/ # Cross-run workflow state (gitignored).
packages/docs-web/src/content/docs/reference/archon-directories.md:46–55 Same omission in the docs-site tree (also missing scripts/ from an older PR). Add state/ (and optionally scripts/).
README.md:251 Says "17 default workflows" but bundled-defaults.generated.ts header says 20. Stale count (pre-existing, not caused by this PR). Update to 20.
README.md:229–251 Workflow table omits archon-adversarial-dev, archon-interactive-prd, archon-workflow-builder (pre-existing gaps). Add rows.

repo-triage is correctly NOT a bundled default (repo-scoped only) — no entry required in bundled-defaults.generated.ts or the README defaults list.

Strengths

  • DRY_RUN=1 discipline is consistent and explicit across every node (dry-run forwarding to sub-agents via Task-prompt prepending is a good pattern).
  • SKIP_* per-node escape hatches with audit lines.
  • Atomic state writes (one write per node, at the end).
  • Sequential comment posting for ID capture (correctly flagged as not-via-Task).
  • closed-pr-dedup-check explicitly and repeatedly refuses to ever close a PR — appropriate ceiling for automation touching contributor work.
  • Live-run evidence in the PR description (35 real comments, $8.14, matches digest counts) is strong validation.

Verdict

NEEDS FIXES — three critical issues (--reason not_planned, link-prs state overwrite, corrupt-state-file silent reset) should block merge. They are all narrow YAML edits.

Recommended Actions

  1. Fix the three critical issues above (single-line changes each, or localized prompt edits).
  2. Address the important issues, especially the JSON-parse-skip silent-failure pattern — it shows up in 4 separate fan-out passes.
  3. Update the two .archon/ directory trees to list state/.
  4. Consider the simplification savings as a follow-up — not blocking.

Critical (3):
- `gh issue close --reason "not planned"` (space, not underscore) — the
  CLI expects lowercase with a space; `not_planned` fails at runtime.
  Fixed in both auto-close paths (triage-issues step 8, closed-dedup-
  check step 7).
- link-prs step 7 state save was sparse `{ sha, processedAt, related,
  fullyAddresses }`, overwriting `commentIds` / `templateNudgedAt` /
  `templateAdherence`. Changed to explicit merge that spreads existing
  entry first so per-run captured fields survive.
- Corrupt-JSON state files previously treated as first-run default
  (silent `pendingDedupComments` reset → 3-day clock restarts forever).
  All five state-load sites now abort loudly on JSON.parse throw;
  ENOENT/empty continue to default-shape.

Important (7):
- Sub-agents (`brief-gen`, `closed-brief-gen`, `pr-brief-gen`,
  `pr-issue-matcher`) emit `ERROR: <reason>` on gh failures rather than
  partial/fabricated JSON. Orchestrator detects the sentinel, logs the
  failed ID + first 200 chars of raw response, tracks in a failed-list,
  and aborts the cluster/match pass if ≥50% of items failed (avoids
  acting on bad data).
- `pr-brief-gen` now sets `diffTruncated: true` when the 30k-char diff
  cap hits; link-prs verify pass downgrades any `fully-addresses` claim
  to `related` when either side's brief was truncated.
- 3-day auto-close validates `postedAt` parses as ISO-8601 before the
  elapsed-time comparison; corrupt timestamps are logged and skipped,
  never acted on.
- `gh issue close` failure path no longer drops state — sets
  `closeAttemptFailed: true` on the entry for next-run retry. Only
  drops on exit 0.
- `closed-pr-dedup-check` idempotency check (`gh pr view --json comments`)
  now aborts the post on fetch failure rather than falling through —
  prevents double-posts on gh hiccups.
- `triage-agent` label pass has preflight `test -f` check for
  `.claude/agents/triage-agent.md`; skips the pass with a clear log if
  the file is missing rather than firing Task calls that fail obscurely.
- `brief-gen` template-adherence wording flipped from "Ignore … as
  'filled'" (ambiguous, read as affirmative) to explicit "A section
  counts as MISSING when …", matching the `pr-issue-matcher` phrasing.

Minor:
- `stale-nudge` idempotency check uses substring "has been quiet for"
  instead of a prefix check that never matched (posted body starts
  with @<author>).
- `closed-dedup-check` distinguishes "upstream crashed" (missing/corrupt
  triage-state.json, or `lastRunAt == null`) from "legitimately quiet
  day" (state present, briefs empty) — different log lines.
- Slack curl adds `-w "\nHTTP_STATUS:%{http_code}"` + `2>&1` so TLS /
  4xx / 5xx errors are visible in captured output.
- `stateReason` values from `gh issue view --json stateReason` are
  UPPERCASE (`COMPLETED`, `NOT_PLANNED`); documented and instruct
  sub-agent to normalize to lowercase for consistency.

Docs:
- CLAUDE.md repo-level `.archon/` tree now lists `state/`.
- archon-directories.md tree adds `state/` + `scripts/` (both were
  missing) with purpose descriptions.

Deferred (worth doing as a follow-up, not blocking):
- DRY/SKIP preamble duplication (~30-50 lines across 5 nodes).
- Explicit `BASELINE_IS_EMPTY` capture in link-prs (current derived
  check works but is a load-bearing model instruction).
- Digest `WARNING` prefix block when upstream nodes are missing
  outputs — today's "(output unavailable)" sub-line is functional.
- Pre-existing README workflow-count (17 → 20) and table gaps — not
  caused by this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant