Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
ef5d2f4
initial sync plus first status-/semantic-reconciliation of PLAN vs PL…
lunelson May 13, 2026
1c551c3
reconcile SPEC, and delete fe-705 reference copies
lunelson May 13, 2026
0deb775
FE-705: Add agent JSONL lifecycle capabilities
lunelson May 11, 2026
27e28e5
FE-705: Add agent chat read capabilities
lunelson May 11, 2026
696c8b6
FE-705: Add deterministic chat readiness
lunelson May 11, 2026
32a65e9
FE-705: Generate agent chat readiness
lunelson May 11, 2026
642d57d
FE-705: Add agent turn response capability
lunelson May 11, 2026
61f2ca2
FE-705: Harden agent readiness smoke
lunelson May 11, 2026
9855375
FE-705: Add scripted probe runner core
lunelson May 12, 2026
0e7242c
FE-705: Add process-backed probe runner
lunelson May 12, 2026
ef3b2ea
FE-705: Harden probe artifacts
lunelson May 12, 2026
58076e1
FE-705: Guard probe runner imports
lunelson May 12, 2026
0688391
add the d3k skill, as potential better solution than agent-tail
lunelson May 12, 2026
5ff5d03
Move probe runner to scripts harness
lunelson May 12, 2026
d8477bf
Preserve probe workspace state
lunelson May 12, 2026
6ddc1b1
Add probe response policy seam
lunelson May 12, 2026
befc34a
Add model-backed probe user policy
lunelson May 12, 2026
6bedcca
Add packaged LLM user smoke helper
lunelson May 12, 2026
2db789d
Add fixture candidate checkpoint
lunelson May 12, 2026
2e086f0
Harden probe JSONL transport failures
lunelson May 12, 2026
c6d4bdf
Capture process probe failure artifacts
lunelson May 12, 2026
be28f27
Add probe runner turn budget
lunelson May 12, 2026
eff1b4b
Validate fixture candidate structure
lunelson May 12, 2026
8309e54
Split fixture readiness reporting
lunelson May 12, 2026
49214bb
first full grill of spec evolution strategeis
lunelson May 12, 2026
36e72f8
RFC version of spec evolution, integrated in to spec and plan
lunelson May 12, 2026
125c84f
consolidation pass on design docs
lunelson May 13, 2026
fb68170
Map runtime design doc supersession
lunelson May 13, 2026
9bb18a7
Clarify side-chat shipped and horizon claims
lunelson May 13, 2026
34b23c3
Translate patch ledger doc to changeset vocabulary
lunelson May 13, 2026
8a946ba
Clarify multi-chat substrate authority
lunelson May 13, 2026
146c347
Refresh design doc navigation index
lunelson May 13, 2026
d7d8f97
Retire runtime docs refactor plan
lunelson May 13, 2026
a803322
update deferred reconciliations
lunelson May 13, 2026
2463617
first pass adoption of new pocock-derived skills
lunelson May 13, 2026
521e68d
activation density for new skills
lunelson May 13, 2026
c9a5aa4
refactor of the ln-plan skill and template + all skills that referenc…
lunelson May 13, 2026
ada5a47
migrate PLAN.md to the new structure
lunelson May 13, 2026
d80f37a
separate documentation of ln- skills vs product workflows
lunelson May 13, 2026
de8221e
document and synchronize policy WRT pre-release posture
lunelson May 13, 2026
1bcd5a7
plan a restructuring of SPEC doc and template
lunelson May 13, 2026
72fb708
add new ln-disambiguate skill
lunelson May 13, 2026
8fb9838
distill the disambiguation skill
lunelson May 13, 2026
bad99db
new spec structure, and updates to corresponding skills
lunelson May 13, 2026
ea082c7
add documentation about the skills system
lunelson May 13, 2026
b33161e
coordinate review and design skills, for a codebase improvement/deepe…
lunelson May 13, 2026
4936af1
Extract reconciliation store from db facade
lunelson May 13, 2026
570051c
Extract annotation store from db facade
lunelson May 13, 2026
32ec370
Extract edit impact store from db facade
lunelson May 13, 2026
3629be3
Extract intent graph mutation store
lunelson May 13, 2026
105d64e
Extract review materialization store
lunelson May 13, 2026
c6b044f
Extract entity projection store
lunelson May 13, 2026
bd0c9a5
Extract workflow store from db facade
lunelson May 13, 2026
2cf3629
Extract specification store from db facade
lunelson May 13, 2026
09ae5c2
Document substrate strangler coordination
lunelson May 13, 2026
a4827a2
Fix completed tool activity rendering
lunelson May 13, 2026
d8db6ee
FE-7XX: Planning sync — Conversational Workspace Runtime umbrella
kostandinang May 13, 2026
3dbb82e
FE-709: Replace per-phase InterviewView with ContinuousWorkspaceView
kostandinang May 13, 2026
801bc3e
FE-709: Extract useContinuousWorkspaceController
kostandinang May 13, 2026
4da744f
FE-709: Sidebar scroll-spy highlighting via WorkspaceFocusContext
kostandinang May 13, 2026
db1047a
FE-709: Handoff after Steps 1-3 build burst
kostandinang May 13, 2026
88c1c18
FE-709: Extract shared controller helpers and enrichBottomArtifact to…
kostandinang May 13, 2026
08176ca
FE-709: Retire route-first test assumptions
kostandinang May 13, 2026
37db299
FE-709: Resolve continuous workspace review findings
kostandinang May 13, 2026
3be6821
FE-709: Tighten stable mutation callback refs
kostandinang May 13, 2026
0bfc17a
FE-709: Narrow scroll-spy stability fix
kostandinang May 13, 2026
dcc6a71
FE-709: Restore submitted live tool running state
kostandinang May 14, 2026
d1072bb
FE-709: Forward-port completed-tool indicator fix into core
kostandinang May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions .agents/skills/d3k/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
name: "d3k"
description: "d3k assistant for debugging web apps"
---

# d3k Commands

d3k captures browser and server logs in a unified log file. Use these commands:

## Viewing Errors and Logs

```bash
d3k errors # Show recent errors (browser + server combined)
d3k errors --context # Show errors + user actions that preceded them
d3k errors -n 20 # Show last 20 errors

d3k logs # Show recent logs (browser + server combined)
d3k logs --type browser # Browser logs only
d3k logs --type server # Server logs only
```

## Other Commands

```bash
d3k fix # Deep analysis of application errors
d3k fix --focus build # Focus on build errors

d3k crawl # Discover app URLs
d3k crawl --depth all # Exhaustive crawl
```

## Browser Interaction

`d3k agent-browser` auto-connects to the active session's browser via CDP:

```bash
d3k agent-browser open http://localhost:3000/page
d3k agent-browser snapshot -i # Get element refs (@e1, @e2)
d3k agent-browser click @e2
d3k agent-browser fill @e3 "text"
d3k agent-browser screenshot /tmp/shot.png
```

To target a different browser, run `d3k agent-browser connect <port>` first.

## Codex Fresh Browser/Profile Startup

Use this workflow when the user asks Codex to start d3k with a fresh browser/profile.

1. Close any stale `agent-browser` daemon before launching with `--profile`. Otherwise `agent-browser` will reuse the existing daemon and print `--profile ignored`.
```bash
d3k agent-browser close --all
```

2. Start the app through d3k in `servers-only` mode and keep that command running. In Codex, this is more reliable than asking d3k to launch the browser itself when a fresh profile is required.
```bash
d3k --no-agent --no-skills --servers-only --command "npm run dev -- -H 127.0.0.1 -p 3000" --port 3000 --startup-timeout 90 --no-tui
```

Adjust the package-manager command and port for the project. Prefer `--command` over `--script` when passing framework flags. For npm scripts, put flags after `--`; otherwise tools like Next.js can interpret the port as a project directory.

3. Verify the server before opening more browser windows:
```bash
curl -I http://127.0.0.1:3000
```

4. Open the fresh profile as a separate browser step:
```bash
d3k agent-browser --profile /tmp/d3k-fresh-profile --headed open http://127.0.0.1:3000
```

5. Sanity-check the opened page:
```bash
d3k agent-browser get title
d3k agent-browser snapshot -i
d3k errors
```

Practical rules:

- Prefer `127.0.0.1` for this workflow. If `localhost` hangs or flips between IPv4/IPv6 behavior, do not keep retrying browser launches.
- If `curl -I` hangs, the server is wedged even if the port appears occupied; restart the d3k server process before opening a browser.
- In `servers-only` mode there is no d3k-monitored CDP browser. Use regular `d3k agent-browser` commands, not `d3k cdp-port`.
- In sandboxed agent environments, rerun local-network checks and `agent-browser` opens outside the sandbox when sandbox networking blocks access to `127.0.0.1`.

## Browser Tool Choice

Use `agent-browser` for browser work.

Practical rule:

- Need to drive the same monitored browser session: use `agent-browser`.
- Examples:

```bash
d3k agent-browser snapshot -i
d3k agent-browser click @e2
```

To make d3k prefer one locally when it launches helper browser commands, use:

```bash
d3k --browser-tool agent-browser
```

## Fix Workflow

1. `d3k errors --context` - See errors and what triggered them
2. Fix the code
3. `d3k agent-browser open <url>` then `d3k agent-browser click @e1` to replay
4. `d3k errors` - Verify fix worked

## Creating PRs with Before/After Screenshots

When creating a PR for visual changes, **always capture before/after screenshots** to show the impact:

1. **Before making changes**, screenshot the production site:
```bash
d3k agent-browser open https://production-url.com/affected-page
d3k agent-browser screenshot /tmp/before.png
```

2. **After making changes**, screenshot localhost:
```bash
d3k agent-browser open http://localhost:3000/affected-page
d3k agent-browser screenshot /tmp/after.png
```

3. **Or use the tooling API** to capture multiple routes at once:
```
capture_before_after_screenshots(
productionUrl: "https://myapp.vercel.app",
routes: ["/", "/about", "/contact"]
)
```

4. **Include in PR description** using markdown:
```markdown
### Visual Comparison
| Route | Before | After |
|-------|--------|-------|
| `/` | ![Before](before.png) | ![After](after.png) |
```

Upload screenshots by dragging them into the GitHub PR description.
27 changes: 18 additions & 9 deletions .agents/skills/ln-build/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ A full or light scope card from `ln-scope`, the next ready card in `memory/CARDS

Extract: target behavior / objective, acceptance criteria, and verification approach.

Treat the scope card as the next implementation step inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item; the scope card is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive scope cards may land on the same branch.
Treat the scope card as the next implementation slice inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item and Linear/branch unit; the scope-card slice is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive slices may land on the same branch.

If `memory/CARDS.md` exists, treat it as a derivative execution queue, not canonical planning state. Start with the next card marked `next` or the first unfinished card in that file. If that card is already satisfied on the current branch, do **not** manufacture a no-op build commit; verify the acceptance criteria, mark the card `done` or `dropped` as appropriate, reconcile the queue, and either continue to the next honest build target or route back to `ln-scope` if no build remains.

Expand All @@ -35,7 +35,7 @@ Do not invent new planning docs, scratch histories, or alternate memory location

## Serial execution mode

When several prepared cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.
When several prepared slice cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.

Loop shape:

Expand All @@ -62,18 +62,26 @@ Stop the serial loop immediately when any of these becomes true:

Translate acceptance criteria into failing tests when the change benefits from them. For bugfixes or subtle seam changes, prefer one high-leverage regression test. For trivial maintenance or doc-only work, tests may be unnecessary.

Test behavior through public interfaces, not implementation details. A good test describes what capability exists and would survive internal refactoring. Avoid tests that mock internal collaborators, assert private call order, or inspect storage directly when the public interface can prove the behavior.

Do not horizontal-slice TDD. Never write a batch of imagined tests first and then a batch of implementation. Use tracer bullets: one failing behavioral test → minimum code to pass → next failing behavioral test. Each new test should respond to what the previous cycle taught you.

Run the relevant checks. Confirm failures are meaningful. If the card is already green before any code change, treat that as evidence the queue item is already satisfied or stale — not as permission to create a ceremonial red/green cycle.

## Green

Write the minimum code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.
Write the minimum coherent code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.

No speculative abstractions. Only extract when two concrete cases force it.
Honor the repo's pre-release posture: if the current schema, fixture shape, dummy data, or terminology is wrong for the model, change it and regenerate dependent artifacts rather than preserving accidental compatibility. Delete obsolete paths in the same slice when they are inside the active seam.

No speculative abstractions. Only extract when two concrete cases force it. Do not anticipate later tests or build shape-only scaffolding; let the current behavioral test pull the interface into existence.

## Refactor

With tests green, improve names, boundaries, and obvious local structure. Do not widen scope.

Refactor only while green. Keep the tests pinned to the public behavior so they protect the slice while allowing internals to move. If refactoring reveals that the test is coupled to implementation, fix the test seam before trusting it.

## Verify and commit

Run the project's verification harness. All checks must pass. If the card proved already satisfied and no code or canonical-state change was needed, do not create an empty commit.
Expand All @@ -93,10 +101,10 @@ After the build lands and verification passes, ask:

### If all answers are no

- Mark the work done in `memory/PLAN.md` **if it was tracked there**
- Mark the containing frontier done in `memory/PLAN.md` **if the build completed the frontier item**, usually by updating `Sequencing` / frontier status rather than moving definition blocks
- Update `Recently Completed` if the plan uses it
- Do **not** add new SPEC/PLAN bookkeeping just because work happened
- If the work was non-trivial, required manual verification, or leaves residual risk, record `Done / Verified / Watch` in `memory/PLAN.md` `Recently Completed` when that watch matters beyond the current session
- Do **not** add new SPEC/PLAN bookkeeping just because a slice happened
- If the slice was non-trivial, required manual verification, or leaves residual risk that matters beyond the current session, record it in the containing frontier definition or a terse `Recently Completed` entry only when it affects frontier-level re-entry

### If any answer is yes

Expand All @@ -111,8 +119,9 @@ Update only the touched traceability items.
#### Update rules

1. **PLAN**
- Mark the item done if it was tracked
- If the change closes or unblocks a frontier item, reflect that in `Active`, `Next`, or `Recently Completed`
- Mark the frontier item done if this slice completed it
- If the change closes, blocks, or unblocks a frontier item, reflect that in `Sequencing`, the affected `Frontier Definitions` entry, or `Recently Completed`
- Do not mirror detailed slice/card history into `memory/PLAN.md`; keep active execution queues in `memory/CARDS.md`

2. **Assumptions**
- evidence answered it → update to `validated` or `invalidated`
Expand Down
3 changes: 2 additions & 1 deletion .agents/skills/ln-consult/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Presume **structural** on a fresh thread when the work touches workflow closure,

Default rule:

`ln-grillln-spec → ln-plan → [ln-design] → [ln-oracles] → ln-scope → [ln-spike] → ln-build → ln-review → [ln-refactor] → [ln-sync]`
`ln-grill` or `ln-disambiguate` → `ln-spec``ln-plan`optional `ln-design` / `ln-oracles``ln-scope`optional `ln-spike``ln-build``ln-review`optional `ln-refactor` / `ln-sync`

Bounded exception:

Expand All @@ -80,6 +80,7 @@ Only recommend the bounded serial exception when those same conditions hold and
| Situation | Work type | Suggest |
| --- | --- | --- |
| Idea is vague, needs fleshing out | structural | `ln-grill` |
| Plausible interpretations diverge; examples would clarify faster than open-ended questioning | structural | `ln-disambiguate` |
| Understanding exists, needs a written spec | structural | `ln-spec` |
| Spec exists, needs work sequencing | structural | `ln-plan` |
| Verification strategy is the main uncertainty | structural | `ln-oracles` |
Expand Down
14 changes: 10 additions & 4 deletions .agents/skills/ln-design/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ argument-hint: "[module or API boundary to explore]"

# Ln Design

Apply Ousterhout's "Design It Twice": generate **3+ radically different module shapes**, compare on depth, and synthesize. The goal is deep modules — small API surfaces hiding significant complexity. Do not implement; this is purely about the shape of the boundary.
Apply Ousterhout's "Design It Twice": generate **3+ radically different module shapes**, compare on depth, and synthesize. The goal is deep modules — small interfaces hiding significant complexity. Do not implement; this is purely about the shape of the seam.

Use `ln-design` as the deepening pathway from `ln-review`: when review surfaces a shallow module or weak seam, explore alternative deepened module shapes here before routing to `ln-scope` or `ln-refactor`.

## Input

Expand All @@ -16,7 +18,9 @@ The module or API boundary: $ARGUMENTS

### 1. Gather requirements

Understand the problem, the callers, the key operations, constraints, and — crucially — what complexity should be hidden inside vs exposed. Skip steps you already know the answer to.
Understand the problem, the callers, the key operations, constraints, and — crucially — what complexity should be hidden inside vs exposed. If this design follows an `ln-review` deepening candidate, start from that candidate's files, problem, possible direction, and benefits. Skip steps you already know the answer to.

Read `memory/SPEC.md` first when it exists. Use its lexicon for domain terms and respect its live assumptions, decisions, and invariants. Read `memory/PLAN.md` when the seam touches active or near-horizon work.

### 2. Generate designs (parallel sub-agents)

Expand All @@ -27,13 +31,15 @@ Spawn 3+ sub-agents simultaneously. Each must produce a **radically different**
- "Optimize for the most common case"
- "Take inspiration from [specific paradigm or library]"

Each agent returns: **API signature** (types, methods, params), **usage example**, **what it hides**, and **trade-offs**.
Each agent returns: **interface** (types, methods, params, invariants, ordering constraints, error modes, required configuration, and performance characteristics), **usage example**, **what it hides**, **seam / adapter strategy** where relevant, and **trade-offs**.

### 3. Present and compare

Show each design sequentially, then compare in prose on:

- **Depth** (Ousterhout's depth test): small surface hiding significant complexity (good) vs large surface with thin implementation (bad)
- **Depth** (Ousterhout's depth test): small interface hiding significant complexity (good) vs large interface with thin implementation (bad)
- **Locality**: whether change, bugs, knowledge, and verification concentrate behind the seam
- **Leverage**: what callers get per fact they must learn about the interface
- **Ease of correct use** vs ease of misuse
- **General-purpose vs specialized**: flexibility vs focus
- **Implementation efficiency**: does the shape allow efficient internals?
Expand Down
Loading
Loading