FE-705: Agent CLI capabilities and workflow documentation by lunelson · Pull Request #132 · hashintel/brunch

lunelson · 2026-05-13T12:37:05Z

Summary

Adds the FE-705 agent-facing CLI capability substrate and probe harness, then reconciles the surrounding design/planning docs and ln-* skill workflow so the branch is reviewable as one implementation + methodology update.

What changed

Added server-side agent capability primitives for JSONL lifecycle, chat readiness/read access, turn response handling, and capability registration.
Added scripts/agent-probes/ with process-backed probe runner coverage, fixture-candidate validation, packaged smoke helpers, and model-backed LLM user policy seams.
Consolidated the conversational workspace design docs: archived the broad intent-spec synthesis, clarified multi-chat / side-chat / changeset-ledger authority, added strategy docs, and refreshed the design index.
Restructured memory/PLAN.md and added memory/SPEC_RESTRUCTURE.md to reduce planning conflicts and clarify the new frontier/sequencing model.
Expanded and tightened the local skill workflow: added d3k, ln-diagnose, and ln-prototype; refined ln-build, ln-scope, ln-review, ln-sync, and planning-pr; added the pre-release change posture to AGENTS.md.
Moved dev-workflow evolution rationale under docs/design/ln-skills/ so skill design notes are separate from executable skills and product specs.

Validation

npm run fix passes with the existing unrelated unused-variable warnings in interview-view tests.

…AN from fe-705

…e planning, for low-conflict

lunelson · 2026-05-13T12:37:20Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

cursor · 2026-05-13T12:59:56Z

PR Summary

Low Risk
Documentation-only changes that adjust agent workflow guidance; low risk aside from potentially shifting team process if the updated planning vocabulary or templates are adopted incorrectly.

Overview
Adds new agent skill docs for debugging and iteration workflows (d3k command guide, plus new ln-diagnose and ln-prototype skills).

Refactors the ln-* skill guidance to standardize frontier item vs slice vocabulary, with an updated ln-plan template that introduces Context, Sequencing by stable frontier id, and Frontier Definitions, and corresponding updates across ln-scope, ln-build, ln-oracles, ln-review, ln-spec, ln-spike, ln-sync, and AGENTS.md.

Reconciles surrounding design/archive docs (e.g., PLAN_HISTORY.md additions, design-doc authority/status notes, link fixes, and an audited/trimmed DEFERRED_RECONCILIATIONS.md) and narrows planning-pr into an advisory skill that recommends (not auto-creates) separate planning PRs only when explicitly needed.

^{Reviewed by Cursor Bugbot for commit 1bcd5a7. Bugbot is set up for automated code reviews on this repo. Configure here.}

augmentcode · 2026-05-13T12:59:59Z

This pull request is abnormally large and would use a significant amount of tokens to review. If you still wish to review it, comment "augment review" and we will review it.

memory/PLAN.md and docs/archive/PLAN_HISTORY.md changes from this branch will land as a planning-only PR off main after Lu's #132 + #133 stack merges, per the planning-pr convention (known merge conflicts on planning docs + frontier-definitions migration triggers separate-PR recommendation). This keeps PR #134 code-only and conflict-free with Lu's stack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kostandinang · 2026-05-13T19:02:19Z

Code review

Found 2 issues:

Temp workspace directory leaked in runProcessBackedProbe. The finally block only calls spawned.endStdin() and never removes the mkdtempSync(...) workspace at workspaceCwd. Tests manually rmSync(result.workspaceCwd), but production callers (e.g. packaged-smoke.ts) don't — every probe run leaves a /tmp/brunch-probe-workspace-* directory behind, accumulating without bound across CI/smoke runs.

brunch/scripts/agent-probes/probe-runner.ts

Lines 182 to 207 in 1bcd5a7

    
             turnBudget, 
        
           }: ProcessBackedProbeOptions): Promise<ProbeRunResult> { 
        
             const workspaceCwd = mkdtempSync(join(tmpdir(), 'brunch-probe-workspace-')); 
        
             const spawned = spawnProcess({ cwd: workspaceCwd, command, args, env }); 
        
             const transport = createProcessJsonlTransport(spawned); 
        
             try { 
        
               const result = await runScriptedProbe({ 
        
                 transport, 
        
                 scenario, 
        
                 scriptedAnswers, 
        
                 responsePolicy, 
        
                 simulatedUserEvents, 
        
                 turnBudget, 
        
               }); 
        
               result.workspaceCwd = workspaceCwd; 
        
               if (preserveWorkspaceState) { 
        
                 result.preservedWorkspaceStatePath = copyWorkspaceState({ workspaceCwd, outputDir }); 
        
               } 
        
               writeProbeArtifacts(outputDir, result); 
        
               return result; 
        
             } finally { 
        
               spawned.endStdin(); 
        
             } 
        
           }

chat.ensureReady capability is registered with authority: 'runtime_replay', but the slice "Generate agent chat readiness" in this PR changes the handler to invoke streamInterviewer (a live LLM call). runtime_replay is documented as "writes replay/status artifacts tied to an existing durable unit" — i.e. deterministic and replay-safe. An LLM call is neither. Downstream adapters or audits that key off authority to decide whether a call is safe to retry/replay will draw the wrong conclusion. Either reclassify (likely commit_truth) or document why runtime_replay still holds.

brunch/src/server/capability-registry.ts

Lines 134 to 142 in 1bcd5a7

    
           }, 
        
           { 
        
             id: 'chat.ensureReady', 
        
             authority: 'runtime_replay', 
        
             summary: 'Ensure an explicit chat has an answerable generated frontier.', 
        
             inputSchema: 'chat.ensureReady.input.v1', 
        
             outputSchema: 'chat.ensureReady.output.v1', 
        
             handler: null, 
        
           },

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

lunelson · 2026-05-14T09:43:45Z

augment review

augmentcode · 2026-05-14T09:45:57Z

🤖 Augment PR Summary

Summary: This PR adds agent-facing CLI capability substrate and probe harness for the FE-705 frontier, plus reconciles surrounding design/planning docs and ln-* skill workflows.

Code changes:

Added server-side agent capability primitives: JSONL session protocol (agent-jsonl.ts), capability dispatcher (capabilities.ts), and expanded capability registry with executable contracts for spec/chat/turn operations
Added scripts/agent-probes/ with process-backed probe runner, fixture-candidate validation, packaged smoke helpers, and model-backed LLM-as-user policy seams
Updated CLI (cli.ts) to support brunch agent subcommand for JSONL stdin/stdout sessions
Extended build/lint/test tooling to include scripts/ directory

Documentation changes:

Added new agent skill docs: d3k, ln-diagnose, ln-prototype
Standardized frontier-item vs slice vocabulary across all ln-* skills and planning docs
Restructured memory/PLAN.md into conflict-resistant shape with Sequencing/Frontier Definitions sections
Consolidated design docs: archived broad synthesis, added strategy/runtime-cluster docs, refreshed design index
Moved dev-workflow evolution docs under docs/design/ln-skills/

Technical Notes: The JSONL capability adapter drives the real Brunch interview flow through Brunch-owned contracts; the probe runner exercises this surface only through a JSONL client, maintaining the import boundary that probe code must not import DB/product handlers directly.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-05-14T09:45:59Z

+    throw new CapabilityDispatchError(`Specification ${chat.specification_id} not found`, 'handler_failed');
+  }
+
+  const currentPhase = state.workflow.phases.grounding.status === 'closed' ? 'design' : 'grounding';


This currentPhase fallback only considers grounding vs design, but the workflow has four phases (grounding, design, requirements, criteria). If grounding is closed and the spec is in requirements or criteria, the idle-no-frontier phase will still report design. Consider reusing the existing getCurrentWorkflowPhase from src/shared/phase-close.ts or iterating workflowPhaseOrder to find the first unclosed phase.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-05-14T09:45:59Z

+  simulatedUserEvents,
+  turnBudget,
+}: ProcessBackedProbeOptions): Promise<ProbeRunResult> {
+  const workspaceCwd = mkdtempSync(join(tmpdir(), 'brunch-probe-workspace-'));


The temp workspace directory created by mkdtempSync is never cleaned up in runProcessBackedProbe. The finally block only calls spawned.endStdin() but never removes workspaceCwd. Tests manually rmSync the returned result.workspaceCwd, but production callers like packaged-smoke.ts do not, leaking a /tmp/brunch-probe-workspace-* directory on every probe run.

Severity: medium

Other Locations

scripts/agent-probes/packaged-smoke.ts:36

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-05-14T09:45:59Z

+  if (!turn) {
+    throw new CapabilityDispatchError(`Turn ${input.turnId} not found`, 'handler_failed');
+  }
+  if (turn.chat_id !== chat.id || turn.specification_id !== chat.specification_id) {


turn.chat_id may be null for pre-multi-chat turns (the column is nullable per the schema). When turn.chat_id is null, the condition turn.chat_id !== chat.id is always true (since null !== number), causing the guard to reject legitimate turns that belong to the spec but were created before chat association was backfilled. Consider also checking turn.specification_id === chat.specification_id as a sufficient ownership proof when chat_id is null.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

lunelson added 30 commits May 13, 2026 13:17

initial sync plus first status-/semantic-reconciliation of PLAN vs PL…

ef5d2f4

…AN from fe-705

reconcile SPEC, and delete fe-705 reference copies

1c551c3

FE-705: Add agent JSONL lifecycle capabilities

0deb775

FE-705: Add agent chat read capabilities

27e28e5

FE-705: Add deterministic chat readiness

696c8b6

FE-705: Generate agent chat readiness

32a65e9

FE-705: Add agent turn response capability

642d57d

FE-705: Harden agent readiness smoke

61f2ca2

FE-705: Add scripted probe runner core

9855375

FE-705: Add process-backed probe runner

0e7242c

FE-705: Harden probe artifacts

ef3b2ea

FE-705: Guard probe runner imports

58076e1

add the d3k skill, as potential better solution than agent-tail

0688391

Move probe runner to scripts harness

5ff5d03

Preserve probe workspace state

d8477bf

Add probe response policy seam

6ddc1b1

Add model-backed probe user policy

befc34a

Add packaged LLM user smoke helper

6bedcca

Add fixture candidate checkpoint

2db789d

Harden probe JSONL transport failures

2e086f0

Capture process probe failure artifacts

c6d4bdf

Add probe runner turn budget

be28f27

Validate fixture candidate structure

eff1b4b

Split fixture readiness reporting

8309e54

first full grill of spec evolution strategeis

49214bb

RFC version of spec evolution, integrated in to spec and plan

36e72f8

consolidation pass on design docs

125c84f

Map runtime design doc supersession

fb68170

Clarify side-chat shipped and horizon claims

9bb18a7

Translate patch ledger doc to changeset vocabulary

34b23c3

lunelson added 7 commits May 13, 2026 14:04

Clarify multi-chat substrate authority

8a946ba

Refresh design doc navigation index

146c347

Retire runtime docs refactor plan

d7d8f97

update deferred reconciliations

a803322

first pass adoption of new pocock-derived skills

2463617

activation density for new skills

521e68d

refactor of the ln-plan skill and template + all skills that referenc…

c9a5aa4

…e planning, for low-conflict

lunelson added 4 commits May 13, 2026 14:41

migrate PLAN.md to the new structure

ada5a47

separate documentation of ln- skills vs product workflows

d80f37a

document and synchronize policy WRT pre-release posture

de8221e

plan a restructuring of SPEC doc and template

1bcd5a7

lunelson changed the title ~~initial sync plus first status-/semantic-reconciliation of PLAN vs PLAN from fe-705~~ FE-705: Agent CLI capabilities and workflow documentation May 13, 2026

lunelson marked this pull request as ready for review May 13, 2026 12:59

lunelson self-assigned this May 13, 2026

lunelson requested a review from kostandinang May 13, 2026 12:59

lunelson mentioned this pull request May 13, 2026

FE-705: Planning extensions and persistence facade cleanup #133

Open

This was referenced May 13, 2026

FE-709: Planning sync — Conversational Workspace Runtime umbrella frontier decomposition #135

Closed

FE-709: Planning sync — Conversational Workspace Runtime umbrella frontier decomposition #136

Open

augmentcode Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FE-705: Agent CLI capabilities and workflow documentation#132

FE-705: Agent CLI capabilities and workflow documentation#132
lunelson wants to merge 41 commits into
mainfrom
ln/fe-705-cli-capabilities-with-docs-and-skills-update

lunelson commented May 13, 2026 •

edited

Loading

Uh oh!

lunelson commented May 13, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 13, 2026 •

edited

Loading

Uh oh!

augmentcode Bot commented May 13, 2026

Uh oh!

kostandinang commented May 13, 2026

Uh oh!

lunelson commented May 14, 2026

Uh oh!

augmentcode Bot commented May 14, 2026

Uh oh!

augmentcode Bot left a comment

Uh oh!

augmentcode Bot May 14, 2026 •

edited

Loading

Uh oh!

augmentcode Bot May 14, 2026 •

edited

Loading

Uh oh!

augmentcode Bot May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lunelson commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Validation

Uh oh!

lunelson commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

augmentcode Bot commented May 13, 2026

Uh oh!

kostandinang commented May 13, 2026

Code review

Uh oh!

lunelson commented May 14, 2026

Uh oh!

augmentcode Bot commented May 14, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lunelson commented May 13, 2026 •

edited

Loading

lunelson commented May 13, 2026 •

edited

Loading

cursor Bot commented May 13, 2026 •

edited

Loading

augmentcode Bot May 14, 2026 •

edited

Loading

augmentcode Bot May 14, 2026 •

edited

Loading

augmentcode Bot May 14, 2026 •

edited

Loading