feat: add gsd:plan-review-convergence command#2339
feat: add gsd:plan-review-convergence command#2339caius-kong wants to merge 3 commits intogsd-build:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughA new Changes
Sequence DiagramsequenceDiagram
participant User
participant Orchestrator as Orchestrator<br/>(gsd:plan-review-convergence)
participant PlanAgent as Agent<br/>(gsd-plan-phase)
participant ReviewAgent as Agent<br/>(gsd-review)
User->>Orchestrator: /gsd-plan-review-convergence 04.1 --codex --max-cycles 3
Orchestrator->>Orchestrator: Init phase metadata\nCheck for existing plans
alt No existing plans
Orchestrator->>PlanAgent: Spawn Agent: gsd-plan-phase
PlanAgent->>PlanAgent: Create PLAN.md artifacts
PlanAgent-->>Orchestrator: Planning complete
end
loop Until HIGH == 0 or MAX_CYCLES reached
Orchestrator->>ReviewAgent: Spawn Agent: gsd-review --codex
ReviewAgent->>ReviewAgent: Analyze plans\nProduce REVIEWS.md with CYCLE_SUMMARY
ReviewAgent-->>Orchestrator: Review complete
Orchestrator->>Orchestrator: Parse CYCLE_SUMMARY\nExtract HIGH_COUNT and HIGH items
alt HIGH == 0
Orchestrator->>Orchestrator: Update STATE.md\nMark as planned-phase
Orchestrator-->>User: ✓ Convergence achieved
else if cycle >= MAX_CYCLES
Orchestrator-->>User: Escalation gate: Proceed anyway? / Manual review?
alt User selects Proceed
Orchestrator-->>User: ✓ Accept remaining HIGHs, exit
else User selects Manual
Orchestrator-->>User: Print manual review instructions, exit
end
else
Orchestrator-->>User: ⚠ Stall warning if HIGH not decreasing
Orchestrator->>PlanAgent: Spawn Agent: gsd-plan-phase --reviews --skip-research
PlanAgent->>PlanAgent: Replan with feedback
PlanAgent-->>Orchestrator: Replan complete
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@commands/gsd/plan-review-convergence.md`:
- Line 4: Update the argument-hint string to include the missing documented
flags so it's aligned with the CLI docs: change the existing argument-hint value
(the "argument-hint" entry in this file) from "<phase> [--codex] [--gemini]
[--all] [--max-cycles N]" to include the additional options (for example
"<phase> [--codex] [--gemini] [--claude] [--opencode] [--text] [--ws] [--all]
[--max-cycles N]"), and apply the same edit to the other argument-hint
occurrences noted elsewhere in the file (the repeated argument-hint blocks
around lines referenced in the comment).
In `@get-shit-done/workflows/plan-review-convergence.md`:
- Line 63: The markdown contains multiple fenced code blocks using ``` without
language tags which trips MD040; update each triple-backtick fence shown in the
diff (and the additional occurrences noted in the review) to include an
appropriate language identifier (for example use ```text for plain output
blocks, ```bash for shell snippets, or ```json for JSON examples) so every
fenced block is annotated and markdownlint MD040 is satisfied.
- Around line 138-140: The HIGH_COUNT assignment can produce "0\n0" because grep
-c exits nonzero when no matches and your || echo "0" runs again; change the
HIGH_COUNT pipeline to avoid emitting a second "0" and ensure a single numeric
value (for example, run grep -c ... 2>/dev/null || true to suppress the non-zero
exit, then normalize with parameter expansion like HIGH_COUNT=${HIGH_COUNT:-0});
update the assignment that defines HIGH_COUNT (and leave HIGH_LINES as-is) so
numeric comparisons against HIGH_COUNT work correctly.
- Around line 20-25: The script calls INIT via the node gsd-tools.cjs init
plan-phase command before extracting PHASE from ARGUMENTS, causing phase
metadata (phase_dir, padded_phase, has_plans, plan_count, etc.) to be
initialized with an empty/stale phase; reorder the logic so you first parse and
validate PHASE and other flags from ARGUMENTS (extract PHASE, ensure it's
non-empty, collect REVIEWER_FLAGS, MAX_CYCLES, GSD_WS, text_mode,
response_language), then run INIT=$(node
"$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init plan-phase "$PHASE") and
parse its JSON output into phase_dir, phase_number, padded_phase, phase_name,
has_plans, plan_count, commit_docs, text_mode, response_language; add explicit
validation to fail early if PHASE is empty and ensure response_language is
handled after parsing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 06cd5b1f-a32b-4823-a43a-6cef6891d1ff
📒 Files selected for processing (4)
CHANGELOG.mdcommands/gsd/plan-review-convergence.mdget-shit-done/workflows/plan-review-convergence.mdtests/plan-review-convergence.test.cjs
trek-e
left a comment
There was a problem hiding this comment.
Adversarial review.
New command and workflow for a cross-AI plan convergence loop. Overall well-structured. Key findings:
Issues addressed by author: All 4 CodeRabbit findings (grep -c fallback, init-before-parse ordering, argument-hint completeness, code block language identifiers) confirmed addressed in commit c401fd3.
No security issues found: No command injection vectors, no path traversal, no unsafe eval/exec patterns. Agent spawning uses Skill() calls, not raw shell.
Structural concerns:
-
No test for the HIGH-count stall detection — the test suite covers loop structure, escalation gates, and REVIEWS.md verification, but there's no test verifying that
prev_high_count == HIGH_COUNTtriggers the stall warning. This is the most fragile logic path and the one most likely to silently regress. -
--wsforwarding to review agent — the workflow spawnsSkill('gsd-review', ...)without threadingGSD_WSthrough to it. If a user is working in a named workspace, the review agent may read from the wrong workspace. This was flagged by CodeRabbit (id 3097651140) and the author addressed init ordering but it's not clear--wsis forwarded. Worth confirming. -
Max-cycles=1 edge case — with
--max-cycles 1, if the initial review finds HIGHs, the loop escalates immediately without a replan. The workflow text handles this but the test suite doesn't cover it.
trek-e
left a comment
There was a problem hiding this comment.
Adversarial review — REQUEST CHANGES.
Trek-e's prior review (2026-04-17T13:15:29Z) raised three concerns. The author addressed CodeRabbit's four findings in commit c401fd3, but trek-e's structural concerns were not addressed. The PR has no response from the author on those points and no subsequent commits that cover them.
Unresolved concerns:
1. --ws forwarding to review agent (correctness bug)
The convergence workflow spawns Skill('gsd-review', args='--phase {PHASE} {REVIEWER_FLAGS}') but does not thread GSD_WS through to the review agent. If the user is working in a named workspace (--ws project-name), the review agent reads from the wrong workspace context. The replan agent does pass {GSD_WS} — making the behavior asymmetric: planning is workspace-aware, reviewing is not. The review agent must also receive the workspace flag.
2. Missing stall detection test
The test suite has 7 suites covering command structure, init, planning gate, convergence loop, stall detection, escalation gate, and artifact verification. The stall detection suite tests that prev_high_count is tracked and that a warning is emitted — but no test verifies the condition HIGH_COUNT >= prev_high_count triggers the warning when the count does not decrease between cycles. The existing stall tests are structural (string presence) rather than behavioral. This is the most fragile path: a refactor of the stall variable names or comparison would break stall detection silently with no test catching it.
3. --max-cycles 1 edge case
With --max-cycles 1, a cycle 1 review that finds HIGHs immediately hits the escalation gate without a replan attempt. The workflow handles this correctly in prose, but the test suite has no coverage for this boundary. Users who set --max-cycles 1 as a dry-run/review-only mode will hit behavior that is untested.
Required fixes before approval:
- Thread
GSD_WSthrough to the review agent spawn call inplan-review-convergence.md - Add a behavioral stall detection test: simulate decreasing HIGH count (no stall warning) and stable HIGH count (stall warning emitted)
- Add a
--max-cycles 1test covering the immediate escalation path
The spec, overall structure, and orchestrator-only design are correct. These are targeted gaps, not architectural problems.
|
CodeRabbit finding audit — all four findings were valid and are confirmed addressed. The most recent CodeRabbit review on this PR (run ID
The trek-e adversarial review (CHANGES_REQUESTED, 2026-04-20) has three structural concerns that are separate from the CodeRabbit findings and remain open: (1) |
|
Thanks for the thorough review @trek-e. All three concerns are addressed in commit 65c8c23: 1. 2. Behavioral stall detection tests — added. The suite now includes:
3.
Bonus fix (CYCLE_SUMMARY contract): In real-world validation on a phase with 3 review cycles, I discovered the raw-grep approach in cycle 2 triggered a false stall: the reviewer's cycle 2 output re-listed resolved HIGHs for audit, inflating the Fixed by mirroring Test count: 27 → 39. All green locally ( |
Cross-AI plan convergence loop that automates the plan→review→replan cycle until no HIGH concerns remain or max cycles (default 3) reached. - Orchestrator-only loop control: spawns isolated Agents for existing gsd-plan-phase and gsd-review Skills — no new planning/review logic - Stall detection: warns immediately when HIGH count not decreasing - Escalation gate: AskUserQuestion when max cycles reached with HIGHs - TEXT_MODE fallback for plain-text escalation (Copilot compatibility) - vscode_askquestions runtime note for VS Code Copilot New files: commands/gsd/plan-review-convergence.md get-shit-done/workflows/plan-review-convergence.md tests/plan-review-convergence.test.cjs (27 tests, all passing) Closes gsd-build#2306 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- argument-hint: add missing --claude, --opencode, --text, --ws flags - workflow step ordering: parse ARGUMENTS before calling gsd-tools init so PHASE is populated when init plan-phase is invoked - HIGH_COUNT: use || true + parameter expansion to avoid "0\n0" when grep -c finds no matches and exits nonzero - MD040: add language tags to all fenced code blocks (bash/text/js) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gsd-build#2306) Addresses the three structural concerns from trek-e's adversarial review and also fixes a latent cycle-awareness bug in HIGH detection. Changes: 1. Thread {GSD_WS} through to review agent spawn call (symmetric with replan). Previously review agent ran in default workspace when user passed --ws, while replan agent correctly received the flag. This was a real correctness bug for named workspaces. 2. Replace raw grep of REVIEWS.md with CYCLE_SUMMARY contract parsed from the review agent's return message. This mirrors gsd-plan-phase's pattern of parsing structured data from the checker's return (not re-reading the artifact file). Rationale: REVIEWS.md accumulates historical content across cycles. A cycle 2 review that lists resolved HIGHs for audit would inflate the raw grep count and falsely trigger stall detection, even though convergence was actually progressing. CYCLE_SUMMARY asks the reviewer to report ONLY current unresolved HIGHs (excluding resolved, historical refs, retrospective summary tables). Fail-loud if the contract is violated — no silent defaulting. 3. Strengthen test suite from 27 → 39 tests: - BEHAVIORAL stall detection tests: verify the >= condition correctly classifies decreasing/stable/increasing HIGH transitions (catches logic inversions or > vs >= regressions). - BEHAVIORAL --max-cycles 1 edge case: verify cycle=1/max=1/HIGHs>0 immediately escalates without replan, and cycle=1/max=1/HIGHs=0 converges via the HIGH_COUNT==0 branch. - New structural tests: step ordering (Parse ARGUMENTS before init), --ws forwarding to review agent, --ws forwarding to replan agent, CYCLE_SUMMARY regex presence, CYCLE_SUMMARY exclusion rules, CYCLE_SUMMARY contract fail-loud abort. Closes gsd-build#2306 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
65c8c23 to
0cd8f81
Compare
Linked Issue
Closes #2306
Feature summary
Adds
gsd:plan-review-convergence— a cross-AI plan convergence loop that automates the manualplan-phase → review → replan → re-reviewchain. The orchestrator spawns isolated Agents for existinggsd-plan-phaseandgsd-reviewSkills, then loops until no HIGH concerns remain in REVIEWS.md or max cycles (default 3) are reached.What changed
New files
commands/gsd/plan-review-convergence.mdgsd-plan-review-convergenceskillget-shit-done/workflows/plan-review-convergence.mdtests/plan-review-convergence.test.cjsModified files
CHANGELOG.md[Unreleased] → Addedentry for the new commandImplementation notes
Implementation matches the approved spec exactly. The only discretionary decisions:
<command-name>.test.cjsconventionchain-flag-plan-phase.test.cjs,code-review-command.test.cjs) — verifies workflow contains key instruction strings the LLM will read at runtime### Addedin[Unreleased]sectionSpec compliance
gsd:plan-review-convergencewith all documented flags (--codex,--gemini,--claude,--opencode,--all,--max-cycles N)--codexis the default reviewer when no flag specifiedgsd-plan-phaseif not, errors if no PLAN.md producedgsd-reviewAskUserQuestionwith "Proceed anyway" / "Manual review" optionsvscode_askquestionsruntime noteTesting
Test coverage
tests/plan-review-convergence.test.cjs— 27 tests across 7 suites:All 27 tests pass:
node --test tests/plan-review-convergence.test.cjsPlatforms tested
Runtimes tested
Scope confirmation
Checklist
Closes #2306approved-featurelabelnpm test)<objective>and<context>tagsBreaking changes
None
Screenshots / recordings
Real-world validation on Phase 04.1 (COMET Evaluation Metrics, 4 PLAN.md files):
Concerns caught: verdict math not switched to new metric, missing INCONCLUSIVE outcome paths, silent COMET model failure modes. All resolved in one replan cycle.
Summary by CodeRabbit