From 1349e1b081ee72aa5985becb8012bc4eeaa9e033 Mon Sep 17 00:00:00 2001 From: caius_kong Date: Fri, 17 Apr 2026 11:07:17 +0800 Subject: [PATCH 1/6] feat: add gsd:plan-review-convergence command (#2306) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cross-AI plan convergence loop that automates the plan→review→replan cycle until no HIGH concerns remain or max cycles (default 3) reached. - Orchestrator-only loop control: spawns isolated Agents for existing gsd-plan-phase and gsd-review Skills — no new planning/review logic - Stall detection: warns immediately when HIGH count not decreasing - Escalation gate: AskUserQuestion when max cycles reached with HIGHs - TEXT_MODE fallback for plain-text escalation (Copilot compatibility) - vscode_askquestions runtime note for VS Code Copilot New files: commands/gsd/plan-review-convergence.md get-shit-done/workflows/plan-review-convergence.md tests/plan-review-convergence.test.cjs (27 tests, all passing) Closes #2306 Co-Authored-By: Claude Opus 4.6 --- commands/gsd/plan-review-convergence.md | 2 +- .../workflows/plan-review-convergence.md | 61 +++++++++---------- tests/plan-review-convergence.test.cjs | 53 ---------------- 3 files changed, 31 insertions(+), 85 deletions(-) diff --git a/commands/gsd/plan-review-convergence.md b/commands/gsd/plan-review-convergence.md index aeba824e75..066056e59b 100644 --- a/commands/gsd/plan-review-convergence.md +++ b/commands/gsd/plan-review-convergence.md @@ -1,7 +1,7 @@ --- name: gsd:plan-review-convergence description: "Cross-AI plan convergence loop — replan with review feedback until no HIGH concerns remain (max 3 cycles)" -argument-hint: " [--codex] [--gemini] [--claude] [--opencode] [--text] [--ws ] [--all] [--max-cycles N]" +argument-hint: " [--codex] [--gemini] [--all] [--max-cycles N]" allowed-tools: - Read - Write diff --git a/get-shit-done/workflows/plan-review-convergence.md b/get-shit-done/workflows/plan-review-convergence.md index e593191539..aa387e90f6 100644 --- a/get-shit-done/workflows/plan-review-convergence.md +++ b/get-shit-done/workflows/plan-review-convergence.md @@ -15,7 +15,18 @@ Read all files referenced by the invoking prompt's execution_context before star -## 1. Parse and Normalize Arguments +## 1. Initialize + +```bash +INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init plan-phase "$PHASE") +if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi +``` + +Parse JSON for: `phase_dir`, `phase_number`, `padded_phase`, `phase_name`, `has_plans`, `plan_count`, `commit_docs`, `text_mode`, `response_language`. + +**If `response_language` is set:** All user-facing output should be in `{response_language}`. + +## 2. Parse and Normalize Arguments Extract from $ARGUMENTS: phase number, reviewer flags (`--codex`, `--gemini`, `--claude`, `--opencode`, `--all`), `--max-cycles N`, `--text`, `--ws`. @@ -37,17 +48,6 @@ GSD_WS="" echo "$ARGUMENTS" | grep -qE '\-\-ws\s+\S+' && GSD_WS=$(echo "$ARGUMENTS" | grep -oE '\-\-ws\s+\S+') ``` -## 2. Initialize - -```bash -INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init plan-phase "$PHASE") -if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi -``` - -Parse JSON for: `phase_dir`, `phase_number`, `padded_phase`, `phase_name`, `has_plans`, `plan_count`, `commit_docs`, `text_mode`, `response_language`. - -**If `response_language` is set:** All user-facing output should be in `{response_language}`. - Set `TEXT_MODE=true` if `--text` is present in $ARGUMENTS OR `text_mode` from init JSON is `true`. When `TEXT_MODE` is active, replace every `AskUserQuestion` call with a plain-text numbered list and ask the user to type their choice number. ## 3. Validate Phase + Pre-flight Gate @@ -60,7 +60,7 @@ PHASE_INFO=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" roadmap get-ph Display startup banner: -```text +``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GSD ► PLAN CONVERGENCE — Phase {phase_number} ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ @@ -77,10 +77,10 @@ Display startup banner: Display: `◆ No plans found — spawning initial planning agent...` -```text +``` Agent( description="Initial planning Phase {PHASE}", - prompt="Run /gsd:plan-phase for Phase {PHASE}. + prompt="Run /gsd-plan-phase for Phase {PHASE}. Execute: Skill(skill='gsd-plan-phase', args='{PHASE} {GSD_WS}') @@ -102,7 +102,7 @@ Display: `Initial planning complete: ${PLAN_COUNT} PLAN.md files created.` Initialize loop variables: -```text +``` cycle = 0 prev_high_count = Infinity ``` @@ -113,12 +113,12 @@ Increment `cycle`. Display: `◆ Cycle {cycle}/{MAX_CYCLES} — spawning review agent...` -```text +``` Agent( description="Cross-AI review Phase {PHASE} cycle {cycle}", - prompt="Run /gsd:review for Phase {PHASE}. + prompt="Run /gsd-review for Phase {PHASE}. -Execute: Skill(skill='gsd-review', args='--phase {PHASE} {REVIEWER_FLAGS} {GSD_WS}') +Execute: Skill(skill='gsd-review', args='--phase {PHASE} {REVIEWER_FLAGS}') Complete the full review workflow. Do NOT return until REVIEWS.md is committed.", mode="auto" @@ -135,8 +135,7 @@ If REVIEWS_FILE is empty: Error — review agent did not produce REVIEWS.md. Exi ### 5b. Check for HIGH Concerns ```bash -HIGH_COUNT=$(grep -c '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null || true) -HIGH_COUNT=${HIGH_COUNT:-0} +HIGH_COUNT=$(grep -c '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null || echo "0") HIGH_LINES=$(grep -B0 -A1 '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null) ``` @@ -147,7 +146,7 @@ node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state planned-phase --phase ``` Display: -```text +``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GSD ► CONVERGENCE COMPLETE ✓ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ @@ -156,7 +155,7 @@ Display: No HIGH concerns remaining. REVIEWS.md: {REVIEWS_FILE} - Next: /gsd:execute-phase {PHASE} + Next: /gsd-execute-phase {PHASE} ``` Exit — convergence achieved. @@ -168,7 +167,7 @@ Exit — convergence achieved. Display: `◆ Cycle {cycle}/{MAX_CYCLES} — {HIGH_COUNT} HIGH concerns found` **Stall detection:** If `HIGH_COUNT >= prev_high_count`: -```text +``` ⚠ Convergence stalled — HIGH concern count not decreasing ({HIGH_COUNT} HIGH concerns, previous cycle had {prev_high_count}) ``` @@ -176,7 +175,7 @@ Display: `◆ Cycle {cycle}/{MAX_CYCLES} — {HIGH_COUNT} HIGH concerns found` **Max cycles check:** If `cycle >= MAX_CYCLES`: If `TEXT_MODE` is true, present as plain-text numbered list: -```text +``` Plan convergence did not complete after {MAX_CYCLES} cycles. {HIGH_COUNT} HIGH concerns remain: @@ -191,7 +190,7 @@ Enter number: ``` Otherwise use AskUserQuestion: -```js +``` AskUserQuestion([ { question: "Plan convergence did not complete after {MAX_CYCLES} cycles. {HIGH_COUNT} HIGH concerns remain:\n\n{HIGH_LINES}\n\nHow would you like to proceed?", @@ -207,11 +206,11 @@ AskUserQuestion([ If "Proceed anyway": Display final status and exit. If "Manual review": -```text +``` Review the concerns in: {REVIEWS_FILE} -To replan manually: /gsd:plan-phase {PHASE} --reviews -To restart loop: /gsd:plan-review-convergence {PHASE} {REVIEWER_FLAGS} +To replan manually: /gsd-plan-phase {PHASE} --reviews +To restart loop: /gsd-plan-review-convergence {PHASE} {REVIEWER_FLAGS} ``` Exit workflow. @@ -223,10 +222,10 @@ Update `prev_high_count = HIGH_COUNT`. Display: `◆ Spawning replan agent with review feedback...` -```text +``` Agent( description="Replan Phase {PHASE} with review feedback cycle {cycle}", - prompt="Run /gsd:plan-phase with --reviews for Phase {PHASE}. + prompt="Run /gsd-plan-phase with --reviews for Phase {PHASE}. Execute: Skill(skill='gsd-plan-phase', args='{PHASE} --reviews --skip-research {GSD_WS}') diff --git a/tests/plan-review-convergence.test.cjs b/tests/plan-review-convergence.test.cjs index b022d3df09..78e7c18483 100644 --- a/tests/plan-review-convergence.test.cjs +++ b/tests/plan-review-convergence.test.cjs @@ -241,59 +241,6 @@ describe('plan-review-convergence workflow: escalation gate (#2306)', () => { }); }); -// ─── Workflow: stall detection — behavioral ─────────────────────────────── - -describe('plan-review-convergence workflow: stall detection behavioral (#2306)', () => { - const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8'); - - test('workflow surfaces stall warning when prev_high_count equals current HIGH_COUNT', () => { - // Behavioral test: two consecutive cycles with the same HIGH count must trigger - // the stall warning. The workflow must compare HIGH_COUNT >= prev_high_count and - // emit a warning string that would appear in output. - assert.ok( - workflow.includes('prev_high_count') || workflow.includes('prev_HIGH'), - 'workflow must track prev_high_count across cycles' - ); - // The comparison that detects the stall - assert.ok( - workflow.includes('HIGH_COUNT >= prev_high_count') || - workflow.includes('HIGH_COUNT >= prev_HIGH') || - workflow.includes('not decreasing'), - 'workflow must compare current HIGH count against previous to detect stall' - ); - // The stall warning text that appears in output - assert.ok( - workflow.includes('stall') || workflow.includes('Stall') || workflow.includes('not decreasing'), - 'workflow must emit a stall warning when HIGH count is not decreasing' - ); - }); -}); - -// ─── Workflow: --max-cycles 1 immediate escalation — behavioral ──────────── - -describe('plan-review-convergence workflow: --max-cycles 1 immediate escalation behavioral (#2306)', () => { - const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8'); - - test('workflow escalates immediately after cycle 1 when --max-cycles 1 and HIGH > 0', () => { - // Behavioral test: when max_cycles=1, after the first review cycle, if HIGH_COUNT > 0 - // the workflow must trigger the escalation gate (cycle >= MAX_CYCLES check fires on - // cycle 1 itself). Verify the workflow contains the logic for this edge case. - assert.ok( - workflow.includes('cycle >= MAX_CYCLES') || - workflow.includes('cycle >= max_cycles') || - (workflow.includes('MAX_CYCLES') && workflow.includes('AskUserQuestion')), - 'workflow must check cycle >= MAX_CYCLES so --max-cycles 1 triggers escalation after first cycle' - ); - // Escalation gate must fire when HIGH > 0 (not just at exactly max_cycles) - assert.ok( - workflow.includes('HIGH_COUNT > 0') || - workflow.includes('HIGH concerns remain') || - workflow.includes('Proceed anyway'), - 'escalation gate must be reachable when HIGH_COUNT > 0 after a single cycle' - ); - }); -}); - // ─── Workflow: REVIEWS.md verification ──────────────────────────────────── describe('plan-review-convergence workflow: artifact verification (#2306)', () => { From 135cd6f4a772eab72980b5adb74350e9b744cdc7 Mon Sep 17 00:00:00 2001 From: caius_kong Date: Fri, 17 Apr 2026 11:24:24 +0800 Subject: [PATCH 2/6] fix: address CodeRabbit review findings (#2306) - argument-hint: add missing --claude, --opencode, --text, --ws flags - workflow step ordering: parse ARGUMENTS before calling gsd-tools init so PHASE is populated when init plan-phase is invoked - HIGH_COUNT: use || true + parameter expansion to avoid "0\n0" when grep -c finds no matches and exits nonzero - MD040: add language tags to all fenced code blocks (bash/text/js) Co-Authored-By: Claude Opus 4.6 --- commands/gsd/plan-review-convergence.md | 2 +- .../workflows/plan-review-convergence.md | 47 ++++++++++--------- 2 files changed, 25 insertions(+), 24 deletions(-) diff --git a/commands/gsd/plan-review-convergence.md b/commands/gsd/plan-review-convergence.md index 066056e59b..aeba824e75 100644 --- a/commands/gsd/plan-review-convergence.md +++ b/commands/gsd/plan-review-convergence.md @@ -1,7 +1,7 @@ --- name: gsd:plan-review-convergence description: "Cross-AI plan convergence loop — replan with review feedback until no HIGH concerns remain (max 3 cycles)" -argument-hint: " [--codex] [--gemini] [--all] [--max-cycles N]" +argument-hint: " [--codex] [--gemini] [--claude] [--opencode] [--text] [--ws ] [--all] [--max-cycles N]" allowed-tools: - Read - Write diff --git a/get-shit-done/workflows/plan-review-convergence.md b/get-shit-done/workflows/plan-review-convergence.md index aa387e90f6..b06ef1e699 100644 --- a/get-shit-done/workflows/plan-review-convergence.md +++ b/get-shit-done/workflows/plan-review-convergence.md @@ -15,18 +15,7 @@ Read all files referenced by the invoking prompt's execution_context before star -## 1. Initialize - -```bash -INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init plan-phase "$PHASE") -if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi -``` - -Parse JSON for: `phase_dir`, `phase_number`, `padded_phase`, `phase_name`, `has_plans`, `plan_count`, `commit_docs`, `text_mode`, `response_language`. - -**If `response_language` is set:** All user-facing output should be in `{response_language}`. - -## 2. Parse and Normalize Arguments +## 1. Parse and Normalize Arguments Extract from $ARGUMENTS: phase number, reviewer flags (`--codex`, `--gemini`, `--claude`, `--opencode`, `--all`), `--max-cycles N`, `--text`, `--ws`. @@ -48,6 +37,17 @@ GSD_WS="" echo "$ARGUMENTS" | grep -qE '\-\-ws\s+\S+' && GSD_WS=$(echo "$ARGUMENTS" | grep -oE '\-\-ws\s+\S+') ``` +## 2. Initialize + +```bash +INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init plan-phase "$PHASE") +if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi +``` + +Parse JSON for: `phase_dir`, `phase_number`, `padded_phase`, `phase_name`, `has_plans`, `plan_count`, `commit_docs`, `text_mode`, `response_language`. + +**If `response_language` is set:** All user-facing output should be in `{response_language}`. + Set `TEXT_MODE=true` if `--text` is present in $ARGUMENTS OR `text_mode` from init JSON is `true`. When `TEXT_MODE` is active, replace every `AskUserQuestion` call with a plain-text numbered list and ask the user to type their choice number. ## 3. Validate Phase + Pre-flight Gate @@ -60,7 +60,7 @@ PHASE_INFO=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" roadmap get-ph Display startup banner: -``` +```text ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GSD ► PLAN CONVERGENCE — Phase {phase_number} ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ @@ -77,7 +77,7 @@ Display startup banner: Display: `◆ No plans found — spawning initial planning agent...` -``` +```text Agent( description="Initial planning Phase {PHASE}", prompt="Run /gsd-plan-phase for Phase {PHASE}. @@ -102,7 +102,7 @@ Display: `Initial planning complete: ${PLAN_COUNT} PLAN.md files created.` Initialize loop variables: -``` +```text cycle = 0 prev_high_count = Infinity ``` @@ -113,7 +113,7 @@ Increment `cycle`. Display: `◆ Cycle {cycle}/{MAX_CYCLES} — spawning review agent...` -``` +```text Agent( description="Cross-AI review Phase {PHASE} cycle {cycle}", prompt="Run /gsd-review for Phase {PHASE}. @@ -135,7 +135,8 @@ If REVIEWS_FILE is empty: Error — review agent did not produce REVIEWS.md. Exi ### 5b. Check for HIGH Concerns ```bash -HIGH_COUNT=$(grep -c '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null || echo "0") +HIGH_COUNT=$(grep -c '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null || true) +HIGH_COUNT=${HIGH_COUNT:-0} HIGH_LINES=$(grep -B0 -A1 '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null) ``` @@ -146,7 +147,7 @@ node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state planned-phase --phase ``` Display: -``` +```text ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GSD ► CONVERGENCE COMPLETE ✓ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ @@ -167,7 +168,7 @@ Exit — convergence achieved. Display: `◆ Cycle {cycle}/{MAX_CYCLES} — {HIGH_COUNT} HIGH concerns found` **Stall detection:** If `HIGH_COUNT >= prev_high_count`: -``` +```text ⚠ Convergence stalled — HIGH concern count not decreasing ({HIGH_COUNT} HIGH concerns, previous cycle had {prev_high_count}) ``` @@ -175,7 +176,7 @@ Display: `◆ Cycle {cycle}/{MAX_CYCLES} — {HIGH_COUNT} HIGH concerns found` **Max cycles check:** If `cycle >= MAX_CYCLES`: If `TEXT_MODE` is true, present as plain-text numbered list: -``` +```text Plan convergence did not complete after {MAX_CYCLES} cycles. {HIGH_COUNT} HIGH concerns remain: @@ -190,7 +191,7 @@ Enter number: ``` Otherwise use AskUserQuestion: -``` +```js AskUserQuestion([ { question: "Plan convergence did not complete after {MAX_CYCLES} cycles. {HIGH_COUNT} HIGH concerns remain:\n\n{HIGH_LINES}\n\nHow would you like to proceed?", @@ -206,7 +207,7 @@ AskUserQuestion([ If "Proceed anyway": Display final status and exit. If "Manual review": -``` +```text Review the concerns in: {REVIEWS_FILE} To replan manually: /gsd-plan-phase {PHASE} --reviews @@ -222,7 +223,7 @@ Update `prev_high_count = HIGH_COUNT`. Display: `◆ Spawning replan agent with review feedback...` -``` +```text Agent( description="Replan Phase {PHASE} with review feedback cycle {cycle}", prompt="Run /gsd-plan-phase with --reviews for Phase {PHASE}. From cb26a557339da129288ac687492eb036eda6ebde Mon Sep 17 00:00:00 2001 From: caius_kong Date: Mon, 20 Apr 2026 17:32:57 +0800 Subject: [PATCH 3/6] fix: address trek-e review + use CYCLE_SUMMARY contract for HIGH count (#2306) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses the three structural concerns from trek-e's adversarial review and also fixes a latent cycle-awareness bug in HIGH detection. Changes: 1. Thread {GSD_WS} through to review agent spawn call (symmetric with replan). Previously review agent ran in default workspace when user passed --ws, while replan agent correctly received the flag. This was a real correctness bug for named workspaces. 2. Replace raw grep of REVIEWS.md with CYCLE_SUMMARY contract parsed from the review agent's return message. This mirrors gsd-plan-phase's pattern of parsing structured data from the checker's return (not re-reading the artifact file). Rationale: REVIEWS.md accumulates historical content across cycles. A cycle 2 review that lists resolved HIGHs for audit would inflate the raw grep count and falsely trigger stall detection, even though convergence was actually progressing. CYCLE_SUMMARY asks the reviewer to report ONLY current unresolved HIGHs (excluding resolved, historical refs, retrospective summary tables). Fail-loud if the contract is violated — no silent defaulting. 3. Strengthen test suite from 27 → 39 tests: - BEHAVIORAL stall detection tests: verify the >= condition correctly classifies decreasing/stable/increasing HIGH transitions (catches logic inversions or > vs >= regressions). - BEHAVIORAL --max-cycles 1 edge case: verify cycle=1/max=1/HIGHs>0 immediately escalates without replan, and cycle=1/max=1/HIGHs=0 converges via the HIGH_COUNT==0 branch. - New structural tests: step ordering (Parse ARGUMENTS before init), --ws forwarding to review agent, --ws forwarding to replan agent, CYCLE_SUMMARY regex presence, CYCLE_SUMMARY exclusion rules, CYCLE_SUMMARY contract fail-loud abort. Closes #2306 Co-Authored-By: Claude Opus 4.7 (1M context) --- .../workflows/plan-review-convergence.md | 41 +++++- tests/plan-review-convergence.test.cjs | 138 ++++++++++++++++-- 2 files changed, 160 insertions(+), 19 deletions(-) diff --git a/get-shit-done/workflows/plan-review-convergence.md b/get-shit-done/workflows/plan-review-convergence.md index b06ef1e699..c3ed1b6266 100644 --- a/get-shit-done/workflows/plan-review-convergence.md +++ b/get-shit-done/workflows/plan-review-convergence.md @@ -118,9 +118,31 @@ Agent( description="Cross-AI review Phase {PHASE} cycle {cycle}", prompt="Run /gsd-review for Phase {PHASE}. -Execute: Skill(skill='gsd-review', args='--phase {PHASE} {REVIEWER_FLAGS}') +Execute: Skill(skill='gsd-review', args='--phase {PHASE} {REVIEWER_FLAGS} {GSD_WS}') -Complete the full review workflow. Do NOT return until REVIEWS.md is committed.", +Complete the full review workflow. Do NOT return until REVIEWS.md is committed. + +--- RETURN CONTRACT (REQUIRED) --- + +Your final return message MUST end with these two sections, exactly as specified. +This is the contract the orchestrator relies on — do NOT skip or reformat. + +Section 1 — Machine-readable summary line (exact format, single line): + +CYCLE_SUMMARY: current_high= current_medium= + +Where counts HIGH-severity concerns that REMAIN UNRESOLVED in this cycle's findings: + - INCLUDE: newly raised HIGHs; PARTIALLY RESOLVED HIGHs; previously raised and still unresolved HIGHs + - EXCLUDE: explicitly RESOLVED/FULLY RESOLVED HIGHs; HIGH mentions in retrospective/summary tables comparing cycles; quoted excerpts from prior reviews + follows the same definition for MEDIUM severity. + +Section 2 — Current HIGH Concerns (for display on escalation): + +## Current HIGH Concerns +- +(If is 0, write exactly: None.) + +--- END RETURN CONTRACT ---", mode="auto" ) ``` @@ -132,13 +154,16 @@ REVIEWS_FILE=$(ls ${phase_dir}/${padded_phase}-REVIEWS.md 2>/dev/null) If REVIEWS_FILE is empty: Error — review agent did not produce REVIEWS.md. Exit. -### 5b. Check for HIGH Concerns +### 5b. Extract HIGH Count from Agent Return -```bash -HIGH_COUNT=$(grep -c '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null || true) -HIGH_COUNT=${HIGH_COUNT:-0} -HIGH_LINES=$(grep -B0 -A1 '\*\*HIGH' "${REVIEWS_FILE}" 2>/dev/null) -``` +**Do NOT grep REVIEWS.md.** REVIEWS.md accumulates historical content across cycles; raw text counts are unreliable (a cycle 2 review that re-lists resolved HIGHs for audit would inflate the count and falsely trigger stall detection). Instead, extract the count from the review agent's RETURN MESSAGE using the CYCLE_SUMMARY contract. + +Parse the agent's final return message: + +- `HIGH_COUNT`: the integer matched by regex `CYCLE_SUMMARY:\s+current_high=(\d+)`. If no match, abort with: `Review agent did not honor the CYCLE_SUMMARY contract — cannot determine HIGH count. Retry or switch reviewer.` +- `HIGH_LINES`: the bullets under the `## Current HIGH Concerns` section of the return message (up to the next `##` heading or end of message). If the section contains only `None.`, set `HIGH_LINES=""`. + +Rationale: `gsd-plan-phase` uses the same data-flow pattern — it parses `gsd-plan-checker`'s structured return rather than re-reading PLAN.md. This keeps the count source-of-truth aligned with what the reviewer actually judged (current cycle only), avoiding false stalls from historical accumulation. **If HIGH_COUNT == 0 (converged):** diff --git a/tests/plan-review-convergence.test.cjs b/tests/plan-review-convergence.test.cjs index 78e7c18483..0ed3b526a3 100644 --- a/tests/plan-review-convergence.test.cjs +++ b/tests/plan-review-convergence.test.cjs @@ -3,8 +3,9 @@ * * Validates that the command source and workflow contain the key structural * elements required for correct cross-AI plan convergence loop behavior: - * initial planning gate, review agent spawning, HIGH count detection, - * stall detection, escalation gate, and STATE.md update on convergence. + * initial planning gate, review agent spawning, HIGH count extraction via + * CYCLE_SUMMARY contract, stall detection (behavioral), escalation gate + * (including --max-cycles 1 edge case), and STATE.md update on convergence. */ const { test, describe } = require('node:test'); @@ -107,6 +108,18 @@ describe('plan-review-convergence workflow: initialization (#2306)', () => { 'workflow must display a startup banner' ); }); + + test('workflow parses ARGUMENTS before calling gsd-tools init (PHASE must be known)', () => { + const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8'); + const parseIdx = workflow.search(/Parse (and Normalize )?Arguments/i); + const initIdx = workflow.search(/node\s+"?\$HOME[^"]*gsd-tools\.cjs"?\s+init\s+plan-phase/); + assert.ok(parseIdx > -1, 'workflow must have an ARGUMENTS parsing section'); + assert.ok(initIdx > -1, 'workflow must call gsd-tools.cjs init plan-phase'); + assert.ok( + parseIdx < initIdx, + 'Arguments parsing must come BEFORE init plan-phase so $PHASE is populated' + ); + }); }); // ─── Workflow: initial planning gate ────────────────────────────────────── @@ -148,10 +161,49 @@ describe('plan-review-convergence workflow: convergence loop (#2306)', () => { ); }); - test('workflow detects HIGH concerns by grepping REVIEWS.md', () => { + test('review agent spawn forwards --ws via {GSD_WS} (symmetric with replan agent)', () => { + // Extract the review agent spawn block (between "Cross-AI review" description and its closing mode="auto") + const reviewMatch = workflow.match(/description="Cross-AI review[\s\S]*?mode="auto"/); + assert.ok(reviewMatch, 'review agent spawn block must exist'); + assert.ok( + reviewMatch[0].includes('{GSD_WS}'), + 'review agent spawn must include {GSD_WS} in the skill args — asymmetry with replan agent is a correctness bug when user passes --ws' + ); + }); + + test('replan agent spawn forwards --ws via {GSD_WS}', () => { + const replanMatch = workflow.match(/description="Replan[\s\S]*?mode="auto"/); + assert.ok(replanMatch, 'replan agent spawn block must exist'); + assert.ok( + replanMatch[0].includes('{GSD_WS}'), + 'replan agent must include {GSD_WS} in the skill args' + ); + }); + + test('workflow extracts HIGH_COUNT via CYCLE_SUMMARY contract (not raw grep of REVIEWS.md)', () => { assert.ok( - workflow.includes('HIGH_COUNT') || workflow.includes('grep'), - 'workflow must grep REVIEWS.md for HIGH concerns to determine convergence' + workflow.includes('CYCLE_SUMMARY'), + 'workflow must use CYCLE_SUMMARY contract — raw grep of REVIEWS.md double-counts historical references across cycles' + ); + assert.ok( + workflow.includes('current_high='), + 'CYCLE_SUMMARY contract must use current_high= format' + ); + assert.ok( + workflow.match(/CYCLE_SUMMARY:\s*\\s\+\s*current_high=\(\\d\+\)/) || + workflow.match(/CYCLE_SUMMARY:\\s\+current_high=\(\\d\+\)/), + 'workflow must extract HIGH_COUNT via regex matching CYCLE_SUMMARY: current_high=' + ); + }); + + test('workflow instructs reviewer to exclude resolved/historical HIGH from count', () => { + assert.ok( + workflow.includes('EXCLUDE') && workflow.includes('RESOLVED'), + 'CYCLE_SUMMARY contract must exclude resolved HIGHs from the count' + ); + assert.ok( + workflow.match(/retrospective|summary tables/i), + 'contract must exclude HIGH mentions in retrospective/summary tables (cycle-comparison artifacts)' ); }); @@ -186,9 +238,9 @@ describe('plan-review-convergence workflow: convergence loop (#2306)', () => { }); }); -// ─── Workflow: stall detection ───────────────────────────────────────────── +// ─── Workflow: stall detection (behavioral) ─────────────────────────────── -describe('plan-review-convergence workflow: stall detection (#2306)', () => { +describe('plan-review-convergence workflow: stall detection — behavioral (#2306)', () => { const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8'); test('workflow tracks previous HIGH count to detect stalls', () => { @@ -198,15 +250,50 @@ describe('plan-review-convergence workflow: stall detection (#2306)', () => { ); }); - test('workflow warns when HIGH count is not decreasing', () => { + test('stall condition uses >= (covers both "stable" AND "increasing" cases)', () => { + // The exact spec: "If HIGH_COUNT >= prev_high_count" + // A bug would be using > alone (misses the stable-count case) or <= (inverted logic) + assert.ok( + workflow.match(/HIGH_COUNT\s*>=\s*prev_high_count/), + 'stall condition must be HIGH_COUNT >= prev_high_count — using > alone would miss the "count stable across cycles" case (real stall)' + ); + assert.ok( + !workflow.match(/HIGH_COUNT\s*<=\s*prev_high_count/), + 'stall condition must NOT be <= (that would be inverted logic — decreasing counts would falsely trigger stall)' + ); + }); + + // Behavioral simulation: verify the condition `HIGH_COUNT >= prev_high_count` + // produces correct stall/no-stall decisions across the three cycle transitions. + const stallCondition = (current, previous) => current >= previous; + + test('BEHAVIORAL: decreasing HIGH count (progress) does NOT trigger stall', () => { + // 5 → 3: real progress, cycle should continue to replan + assert.strictEqual(stallCondition(3, 5), false, 'decreasing 5→3 must NOT be stall'); + assert.strictEqual(stallCondition(1, 10), false, 'decreasing 10→1 must NOT be stall'); + }); + + test('BEHAVIORAL: stable HIGH count (no progress) DOES trigger stall', () => { + // 5 → 5: the original stall case from revision-loop.md + assert.strictEqual(stallCondition(5, 5), true, 'stable 5→5 MUST be stall (revision not converging)'); + assert.strictEqual(stallCondition(0, 0), true, 'stable 0→0 is stall but HIGH_COUNT==0 exits loop first — trap prevents false alarm'); + }); + + test('BEHAVIORAL: increasing HIGH count (regression) DOES trigger stall', () => { + // Regression case: replan made things worse + assert.strictEqual(stallCondition(7, 5), true, 'increasing 5→7 MUST be stall (regression)'); + assert.strictEqual(stallCondition(13, 3), true, 'increasing 3→13 MUST be stall'); + }); + + test('workflow emits stall warning text when condition triggers', () => { assert.ok( - workflow.includes('stall') || workflow.includes('Stall') || workflow.includes('not decreasing'), - 'workflow must warn user when HIGH count is not decreasing between cycles' + workflow.match(/stalled|Stall|not decreasing/), + 'workflow must emit a stall warning message when condition triggers (for user visibility)' ); }); }); -// ─── Workflow: escalation gate ──────────────────────────────────────────── +// ─── Workflow: escalation gate (including --max-cycles 1 edge) ──────────── describe('plan-review-convergence workflow: escalation gate (#2306)', () => { const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8'); @@ -219,6 +306,28 @@ describe('plan-review-convergence workflow: escalation gate (#2306)', () => { ); }); + test('max-cycles condition uses >= (so --max-cycles 1 triggers after cycle 1)', () => { + assert.ok( + workflow.match(/cycle\s*>=\s*MAX_CYCLES/), + 'max-cycles check must use cycle >= MAX_CYCLES — using > alone would require cycle=MAX_CYCLES+1 before escalation' + ); + }); + + // Behavioral simulation: --max-cycles 1 corner case + const shouldEscalate = (cycle, maxCycles, highCount) => highCount > 0 && cycle >= maxCycles; + + test('BEHAVIORAL: --max-cycles 1 with HIGHs in cycle 1 immediately escalates (no replan)', () => { + // User sets --max-cycles 1 as "review-only / dry-run" mode + assert.strictEqual(shouldEscalate(1, 1, 3), true, 'cycle=1, max=1, HIGHs=3 MUST escalate immediately'); + assert.strictEqual(shouldEscalate(1, 1, 0), false, 'cycle=1, max=1, HIGHs=0 converges instead (HIGH_COUNT==0 branch exits earlier)'); + }); + + test('BEHAVIORAL: --max-cycles 3 (default) only escalates after 3 cycles', () => { + assert.strictEqual(shouldEscalate(1, 3, 5), false, 'cycle=1 of 3: must NOT escalate yet — replan next'); + assert.strictEqual(shouldEscalate(2, 3, 5), false, 'cycle=2 of 3: must NOT escalate yet — replan next'); + assert.strictEqual(shouldEscalate(3, 3, 5), true, 'cycle=3 of 3: MUST escalate (max reached)'); + }); + test('escalation offers "Proceed anyway" option', () => { assert.ok( workflow.includes('Proceed anyway'), @@ -259,4 +368,11 @@ describe('plan-review-convergence workflow: artifact verification (#2306)', () = 'workflow must error if the review agent fails to produce REVIEWS.md' ); }); + + test('workflow aborts if review agent omits CYCLE_SUMMARY contract', () => { + assert.ok( + workflow.match(/did not honor the CYCLE_SUMMARY contract/i), + 'workflow must fail-loud (not silently default) if review agent omits the CYCLE_SUMMARY summary line' + ); + }); }); From 4c6367e8175ae96864280c1a3a1ac7d15fadde1c Mon Sep 17 00:00:00 2001 From: caius_kong Date: Fri, 24 Apr 2026 11:17:49 +0800 Subject: [PATCH 4/6] =?UTF-8?q?chore:=20sync=20/gsd-=20=E2=86=92=20/g?= =?UTF-8?q?sd:=20rename=20after=20rebase=20(#2306)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #2595 landed a repo-wide slash-prefix rename while this PR was open. The rebase took our workflow wholesale (preserving CYCLE_SUMMARY + 10 extra tests that Logan's #2390 copy lacked), so the rename needs to be reapplied to the 6 affected lines (spawn prompts + "next steps" text). gsd-tools.cjs is a filesystem path, not a slash command — left alone. Co-Authored-By: Claude Opus 4.7 (1M context) --- get-shit-done/workflows/plan-review-convergence.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/get-shit-done/workflows/plan-review-convergence.md b/get-shit-done/workflows/plan-review-convergence.md index c3ed1b6266..1806670f06 100644 --- a/get-shit-done/workflows/plan-review-convergence.md +++ b/get-shit-done/workflows/plan-review-convergence.md @@ -80,7 +80,7 @@ Display: `◆ No plans found — spawning initial planning agent...` ```text Agent( description="Initial planning Phase {PHASE}", - prompt="Run /gsd-plan-phase for Phase {PHASE}. + prompt="Run /gsd:plan-phase for Phase {PHASE}. Execute: Skill(skill='gsd-plan-phase', args='{PHASE} {GSD_WS}') @@ -116,7 +116,7 @@ Display: `◆ Cycle {cycle}/{MAX_CYCLES} — spawning review agent...` ```text Agent( description="Cross-AI review Phase {PHASE} cycle {cycle}", - prompt="Run /gsd-review for Phase {PHASE}. + prompt="Run /gsd:review for Phase {PHASE}. Execute: Skill(skill='gsd-review', args='--phase {PHASE} {REVIEWER_FLAGS} {GSD_WS}') @@ -181,7 +181,7 @@ Display: No HIGH concerns remaining. REVIEWS.md: {REVIEWS_FILE} - Next: /gsd-execute-phase {PHASE} + Next: /gsd:execute-phase {PHASE} ``` Exit — convergence achieved. @@ -235,8 +235,8 @@ If "Manual review": ```text Review the concerns in: {REVIEWS_FILE} -To replan manually: /gsd-plan-phase {PHASE} --reviews -To restart loop: /gsd-plan-review-convergence {PHASE} {REVIEWER_FLAGS} +To replan manually: /gsd:plan-phase {PHASE} --reviews +To restart loop: /gsd:plan-review-convergence {PHASE} {REVIEWER_FLAGS} ``` Exit workflow. @@ -251,7 +251,7 @@ Display: `◆ Spawning replan agent with review feedback...` ```text Agent( description="Replan Phase {PHASE} with review feedback cycle {cycle}", - prompt="Run /gsd-plan-phase with --reviews for Phase {PHASE}. + prompt="Run /gsd:plan-phase with --reviews for Phase {PHASE}. Execute: Skill(skill='gsd-plan-phase', args='{PHASE} --reviews --skip-research {GSD_WS}') From e38f66da11295a54a4b1aa13874b5e927e79a1e5 Mon Sep 17 00:00:00 2001 From: caius_kong Date: Fri, 24 Apr 2026 11:29:39 +0800 Subject: [PATCH 5/6] docs: clarify CYCLE_SUMMARY parsing is LLM-level + sync success criteria (#2306) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses CodeRabbit review on 4c6367e: 1. Success criteria in still said "grep HIGHs" after we migrated to the CYCLE_SUMMARY contract. Updated to "parse CYCLE_SUMMARY for HIGH count" so implementers are not misled by the obsolete description. 2. CodeRabbit (Major) asked for concrete bash parsing code for the CYCLE_SUMMARY extraction. Clarified in a new "Parsing layer:" note that this extraction is performed by the orchestrator LLM on the Agent's return message, not a shell pipeline — there is no $AGENT_RETURN variable for grep/sed to consume, the return message is in-context for the orchestrator. This is the same data-flow shape gsd-plan-phase uses with gsd-plan-checker's return. Expressing it as shell code would imply a runtime that does not exist here. Tests: 39/39 still green. Co-Authored-By: Claude Opus 4.7 (1M context) --- get-shit-done/workflows/plan-review-convergence.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/get-shit-done/workflows/plan-review-convergence.md b/get-shit-done/workflows/plan-review-convergence.md index 1806670f06..5bdb2faa51 100644 --- a/get-shit-done/workflows/plan-review-convergence.md +++ b/get-shit-done/workflows/plan-review-convergence.md @@ -158,6 +158,8 @@ If REVIEWS_FILE is empty: Error — review agent did not produce REVIEWS.md. Exi **Do NOT grep REVIEWS.md.** REVIEWS.md accumulates historical content across cycles; raw text counts are unreliable (a cycle 2 review that re-lists resolved HIGHs for audit would inflate the count and falsely trigger stall detection). Instead, extract the count from the review agent's RETURN MESSAGE using the CYCLE_SUMMARY contract. +**Parsing layer:** this step is performed by the orchestrator LLM on the Agent's final return message (text returned from `Agent(...)`), not by a bash pipeline. There is no `$AGENT_RETURN` variable to feed `grep`/`sed`; the return message is in-context for the orchestrator. The contract below is therefore expressed as extraction rules, not shell code — the same shape `gsd-plan-phase` uses when parsing `gsd-plan-checker`'s return. + Parse the agent's final return message: - `HIGH_COUNT`: the integer matched by regex `CYCLE_SUMMARY:\s+current_high=(\d+)`. If no match, abort with: `Review agent did not honor the CYCLE_SUMMARY contract — cannot determine HIGH count. Retry or switch reviewer.` @@ -271,7 +273,7 @@ After agent returns → go back to **step 5a** (review again). - [ ] Initial planning via Agent → Skill("gsd-plan-phase") if no plans exist - [ ] Review via Agent → Skill("gsd-review") — isolated, not inline - [ ] Replan via Agent → Skill("gsd-plan-phase --reviews") — isolated, not inline -- [ ] Orchestrator only does: init, loop control, grep HIGHs, stall detection, escalation +- [ ] Orchestrator only does: init, loop control, parse CYCLE_SUMMARY for HIGH count, stall detection, escalation - [ ] Each Agent fully completes its Skill before returning - [ ] Loop exits on: no HIGH concerns (converged) OR max cycles (escalation) - [ ] Stall detection reported when HIGH count not decreasing From d6b1a94f033655aa550e0991874e739681522b78 Mon Sep 17 00:00:00 2001 From: caius_kong Date: Fri, 24 Apr 2026 11:45:26 +0800 Subject: [PATCH 6/6] =?UTF-8?q?docs:=20drop=20Parsing=20layer=20note=20?= =?UTF-8?q?=E2=80=94=20surrounding=20text=20already=20implies=20LLM=20pars?= =?UTF-8?q?ing=20(#2306)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added a defensive "Parsing layer:" paragraph in e38f66d to tell readers not to look for a $AGENT_RETURN bash variable. On review, this note was addressing a concern the surrounding text does not actually create: "Parse the agent's final return message" and the regex-based extraction rules already make clear this is not a shell pipeline, and no other section mentions $AGENT_RETURN. The note was appeasement for the bot review, not clarification for a human reader. Keeping the success criteria sync from e38f66d — that one was a real inconsistency (obsolete "grep HIGHs" after we moved to CYCLE_SUMMARY). Co-Authored-By: Claude Opus 4.7 (1M context) --- get-shit-done/workflows/plan-review-convergence.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/get-shit-done/workflows/plan-review-convergence.md b/get-shit-done/workflows/plan-review-convergence.md index 5bdb2faa51..e672589c26 100644 --- a/get-shit-done/workflows/plan-review-convergence.md +++ b/get-shit-done/workflows/plan-review-convergence.md @@ -158,8 +158,6 @@ If REVIEWS_FILE is empty: Error — review agent did not produce REVIEWS.md. Exi **Do NOT grep REVIEWS.md.** REVIEWS.md accumulates historical content across cycles; raw text counts are unreliable (a cycle 2 review that re-lists resolved HIGHs for audit would inflate the count and falsely trigger stall detection). Instead, extract the count from the review agent's RETURN MESSAGE using the CYCLE_SUMMARY contract. -**Parsing layer:** this step is performed by the orchestrator LLM on the Agent's final return message (text returned from `Agent(...)`), not by a bash pipeline. There is no `$AGENT_RETURN` variable to feed `grep`/`sed`; the return message is in-context for the orchestrator. The contract below is therefore expressed as extraction rules, not shell code — the same shape `gsd-plan-phase` uses when parsing `gsd-plan-checker`'s return. - Parse the agent's final return message: - `HIGH_COUNT`: the integer matched by regex `CYCLE_SUMMARY:\s+current_high=(\d+)`. If no match, abort with: `Review agent did not honor the CYCLE_SUMMARY contract — cannot determine HIGH count. Retry or switch reviewer.`