hashintel · kostandinang · May 13, 2026 · May 13, 2026 · May 11, 2026 · May 11, 2026
diff --git a/.agents/skills/d3k/SKILL.md b/.agents/skills/d3k/SKILL.md
@@ -0,0 +1,145 @@
+---
+name: "d3k"
+description: "d3k assistant for debugging web apps"
+---
+
+# d3k Commands
+
+d3k captures browser and server logs in a unified log file. Use these commands:
+
+## Viewing Errors and Logs
+
+```bash
+d3k errors              # Show recent errors (browser + server combined)
+d3k errors --context    # Show errors + user actions that preceded them
+d3k errors -n 20        # Show last 20 errors
+
+d3k logs                # Show recent logs (browser + server combined)
+d3k logs --type browser # Browser logs only
+d3k logs --type server  # Server logs only
+```
+
+## Other Commands
+
+```bash
+d3k fix                 # Deep analysis of application errors
+d3k fix --focus build   # Focus on build errors
+
+d3k crawl               # Discover app URLs
+d3k crawl --depth all   # Exhaustive crawl
+```
+
+## Browser Interaction
+
+`d3k agent-browser` auto-connects to the active session's browser via CDP:
+
+```bash
+d3k agent-browser open http://localhost:3000/page
+d3k agent-browser snapshot -i    # Get element refs (@e1, @e2)
+d3k agent-browser click @e2
+d3k agent-browser fill @e3 "text"
+d3k agent-browser screenshot /tmp/shot.png
+```
+
+To target a different browser, run `d3k agent-browser connect <port>` first.
+
+## Codex Fresh Browser/Profile Startup
+
+Use this workflow when the user asks Codex to start d3k with a fresh browser/profile.
+
+1. Close any stale `agent-browser` daemon before launching with `--profile`. Otherwise `agent-browser` will reuse the existing daemon and print `--profile ignored`.
+   ```bash
+   d3k agent-browser close --all
+   ```
+
+2. Start the app through d3k in `servers-only` mode and keep that command running. In Codex, this is more reliable than asking d3k to launch the browser itself when a fresh profile is required.
+   ```bash
+   d3k --no-agent --no-skills --servers-only --command "npm run dev -- -H 127.0.0.1 -p 3000" --port 3000 --startup-timeout 90 --no-tui
+   ```
+
+   Adjust the package-manager command and port for the project. Prefer `--command` over `--script` when passing framework flags. For npm scripts, put flags after `--`; otherwise tools like Next.js can interpret the port as a project directory.
+
+3. Verify the server before opening more browser windows:
+   ```bash
+   curl -I http://127.0.0.1:3000
+   ```
+
+4. Open the fresh profile as a separate browser step:
+   ```bash
+   d3k agent-browser --profile /tmp/d3k-fresh-profile --headed open http://127.0.0.1:3000
+   ```
+
+5. Sanity-check the opened page:
+   ```bash
+   d3k agent-browser get title
+   d3k agent-browser snapshot -i
+   d3k errors
+   ```
+
+Practical rules:
+
+- Prefer `127.0.0.1` for this workflow. If `localhost` hangs or flips between IPv4/IPv6 behavior, do not keep retrying browser launches.
+- If `curl -I` hangs, the server is wedged even if the port appears occupied; restart the d3k server process before opening a browser.
+- In `servers-only` mode there is no d3k-monitored CDP browser. Use regular `d3k agent-browser` commands, not `d3k cdp-port`.
+- In sandboxed agent environments, rerun local-network checks and `agent-browser` opens outside the sandbox when sandbox networking blocks access to `127.0.0.1`.
+
+## Browser Tool Choice
+
+Use `agent-browser` for browser work.
+
+Practical rule:
+
+- Need to drive the same monitored browser session: use `agent-browser`.
+- Examples:
+
+```bash
+d3k agent-browser snapshot -i
+d3k agent-browser click @e2
+```
+
+To make d3k prefer one locally when it launches helper browser commands, use:
+
+```bash
+d3k --browser-tool agent-browser
+```
+
+## Fix Workflow
+
+1. `d3k errors --context` - See errors and what triggered them
+2. Fix the code
+3. `d3k agent-browser open <url>` then `d3k agent-browser click @e1` to replay
+4. `d3k errors` - Verify fix worked
+
+## Creating PRs with Before/After Screenshots
+
+When creating a PR for visual changes, **always capture before/after screenshots** to show the impact:
+
+1. **Before making changes**, screenshot the production site:
+   ```bash
+   d3k agent-browser open https://production-url.com/affected-page
+   d3k agent-browser screenshot /tmp/before.png
+   ```
+
+2. **After making changes**, screenshot localhost:
+   ```bash
+   d3k agent-browser open http://localhost:3000/affected-page
+   d3k agent-browser screenshot /tmp/after.png
+   ```
+
+3. **Or use the tooling API** to capture multiple routes at once:
+   ```
+   capture_before_after_screenshots(
+     productionUrl: "https://myapp.vercel.app",
+     routes: ["/", "/about", "/contact"]
+   )
+   ```
+
+4. **Include in PR description** using markdown:
+   ```markdown
+   ### Visual Comparison
+   | Route | Before | After |
+   |-------|--------|-------|
+   | `/` | ![Before](before.png) | ![After](after.png) |
+   ```
+
+   Upload screenshots by dragging them into the GitHub PR description.
diff --git a/.agents/skills/ln-build/SKILL.md b/.agents/skills/ln-build/SKILL.md
@@ -14,7 +14,7 @@ A full or light scope card from `ln-scope`, the next ready card in `memory/CARDS
 
 Extract: target behavior / objective, acceptance criteria, and verification approach.
 
-Treat the scope card as the next implementation step inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item; the scope card is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive scope cards may land on the same branch.
+Treat the scope card as the next implementation slice inside its containing `memory/PLAN.md` frontier item. The frontier item is the plan-level work item and Linear/branch unit; the scope-card slice is just the current execution step inside it. Unless `ln-plan` has already split the frontier into separate items, do **not** infer a new Linear issue or Graphite branch from scope-card granularity; multiple consecutive slices may land on the same branch.
 
 If `memory/CARDS.md` exists, treat it as a derivative execution queue, not canonical planning state. Start with the next card marked `next` or the first unfinished card in that file. If that card is already satisfied on the current branch, do **not** manufacture a no-op build commit; verify the acceptance criteria, mark the card `done` or `dropped` as appropriate, reconcile the queue, and either continue to the next honest build target or route back to `ln-scope` if no build remains.
 
@@ -35,7 +35,7 @@ Do not invent new planning docs, scratch histories, or alternate memory location
 
 ## Serial execution mode
 
-When several prepared cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.
+When several prepared slice cards already exist for one settled frontier item, `ln-build` may execute them in sequence instead of routing back through the user after every commit.
 
 Loop shape:
 
@@ -62,18 +62,26 @@ Stop the serial loop immediately when any of these becomes true:
 
 Translate acceptance criteria into failing tests when the change benefits from them. For bugfixes or subtle seam changes, prefer one high-leverage regression test. For trivial maintenance or doc-only work, tests may be unnecessary.
 
+Test behavior through public interfaces, not implementation details. A good test describes what capability exists and would survive internal refactoring. Avoid tests that mock internal collaborators, assert private call order, or inspect storage directly when the public interface can prove the behavior.
+
+Do not horizontal-slice TDD. Never write a batch of imagined tests first and then a batch of implementation. Use tracer bullets: one failing behavioral test → minimum code to pass → next failing behavioral test. Each new test should respond to what the previous cycle taught you.
+
 Run the relevant checks. Confirm failures are meaningful. If the card is already green before any code change, treat that as evidence the queue item is already satisfied or stale — not as permission to create a ceremonial red/green cycle.
 
 ## Green
 
-Write the minimum code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.
+Write the minimum coherent code to pass. Build inside-out: functional core first, thin I/O shell second, then end-to-end wiring.
 
-No speculative abstractions. Only extract when two concrete cases force it.
+Honor the repo's pre-release posture: if the current schema, fixture shape, dummy data, or terminology is wrong for the model, change it and regenerate dependent artifacts rather than preserving accidental compatibility. Delete obsolete paths in the same slice when they are inside the active seam.
+
+No speculative abstractions. Only extract when two concrete cases force it. Do not anticipate later tests or build shape-only scaffolding; let the current behavioral test pull the interface into existence.
 
 ## Refactor
 
 With tests green, improve names, boundaries, and obvious local structure. Do not widen scope.
 
+Refactor only while green. Keep the tests pinned to the public behavior so they protect the slice while allowing internals to move. If refactoring reveals that the test is coupled to implementation, fix the test seam before trusting it.
+
 ## Verify and commit
 
 Run the project's verification harness. All checks must pass. If the card proved already satisfied and no code or canonical-state change was needed, do not create an empty commit.
@@ -93,10 +101,10 @@ After the build lands and verification passes, ask:
 
 ### If all answers are no
 
-- Mark the work done in `memory/PLAN.md` **if it was tracked there**
+- Mark the containing frontier done in `memory/PLAN.md` **if the build completed the frontier item**, usually by updating `Sequencing` / frontier status rather than moving definition blocks
 - Update `Recently Completed` if the plan uses it
-- Do **not** add new SPEC/PLAN bookkeeping just because work happened
-- If the work was non-trivial, required manual verification, or leaves residual risk, record `Done / Verified / Watch` in `memory/PLAN.md` `Recently Completed` when that watch matters beyond the current session
+- Do **not** add new SPEC/PLAN bookkeeping just because a slice happened
+- If the slice was non-trivial, required manual verification, or leaves residual risk that matters beyond the current session, record it in the containing frontier definition or a terse `Recently Completed` entry only when it affects frontier-level re-entry
 
 ### If any answer is yes
 
@@ -111,8 +119,9 @@ Update only the touched traceability items.
 #### Update rules
 
 1. **PLAN**
-   - Mark the item done if it was tracked
-   - If the change closes or unblocks a frontier item, reflect that in `Active`, `Next`, or `Recently Completed`
+   - Mark the frontier item done if this slice completed it
+   - If the change closes, blocks, or unblocks a frontier item, reflect that in `Sequencing`, the affected `Frontier Definitions` entry, or `Recently Completed`
+   - Do not mirror detailed slice/card history into `memory/PLAN.md`; keep active execution queues in `memory/CARDS.md`
 
 2. **Assumptions**
    - evidence answered it → update to `validated` or `invalidated`

diff --git a/.agents/skills/ln-consult/SKILL.md b/.agents/skills/ln-consult/SKILL.md
@@ -53,7 +53,7 @@ Presume **structural** on a fresh thread when the work touches workflow closure,
 
 Default rule:
 
-`ln-grill → ln-spec → ln-plan → [ln-design] → [ln-oracles] → ln-scope → [ln-spike] → ln-build → ln-review → [ln-refactor] → [ln-sync]`
+`ln-grill` or `ln-disambiguate` → `ln-spec` → `ln-plan` → optional `ln-design` / `ln-oracles` → `ln-scope` → optional `ln-spike` → `ln-build` → `ln-review` → optional `ln-refactor` / `ln-sync`
 
 Bounded exception:
 
@@ -80,6 +80,7 @@ Only recommend the bounded serial exception when those same conditions hold and
 | Situation | Work type | Suggest |
 | --- | --- | --- |
 | Idea is vague, needs fleshing out | structural | `ln-grill` |
+| Plausible interpretations diverge; examples would clarify faster than open-ended questioning | structural | `ln-disambiguate` |
 | Understanding exists, needs a written spec | structural | `ln-spec` |
 | Spec exists, needs work sequencing | structural | `ln-plan` |
 | Verification strategy is the main uncertainty | structural | `ln-oracles` |

diff --git a/.agents/skills/ln-design/SKILL.md b/.agents/skills/ln-design/SKILL.md
@@ -6,7 +6,9 @@ argument-hint: "[module or API boundary to explore]"
 
 # Ln Design
 
-Apply Ousterhout's "Design It Twice": generate **3+ radically different module shapes**, compare on depth, and synthesize. The goal is deep modules — small API surfaces hiding significant complexity. Do not implement; this is purely about the shape of the boundary.
+Apply Ousterhout's "Design It Twice": generate **3+ radically different module shapes**, compare on depth, and synthesize. The goal is deep modules — small interfaces hiding significant complexity. Do not implement; this is purely about the shape of the seam.
+
+Use `ln-design` as the deepening pathway from `ln-review`: when review surfaces a shallow module or weak seam, explore alternative deepened module shapes here before routing to `ln-scope` or `ln-refactor`.
 
 ## Input
 
@@ -16,7 +18,9 @@ The module or API boundary: $ARGUMENTS
 
 ### 1. Gather requirements
 
-Understand the problem, the callers, the key operations, constraints, and — crucially — what complexity should be hidden inside vs exposed. Skip steps you already know the answer to.
+Understand the problem, the callers, the key operations, constraints, and — crucially — what complexity should be hidden inside vs exposed. If this design follows an `ln-review` deepening candidate, start from that candidate's files, problem, possible direction, and benefits. Skip steps you already know the answer to.
+
+Read `memory/SPEC.md` first when it exists. Use its lexicon for domain terms and respect its live assumptions, decisions, and invariants. Read `memory/PLAN.md` when the seam touches active or near-horizon work.
 
 ### 2. Generate designs (parallel sub-agents)
 
@@ -27,13 +31,15 @@ Spawn 3+ sub-agents simultaneously. Each must produce a **radically different**
 - "Optimize for the most common case"
 - "Take inspiration from [specific paradigm or library]"
 
-Each agent returns: **API signature** (types, methods, params), **usage example**, **what it hides**, and **trade-offs**.
+Each agent returns: **interface** (types, methods, params, invariants, ordering constraints, error modes, required configuration, and performance characteristics), **usage example**, **what it hides**, **seam / adapter strategy** where relevant, and **trade-offs**.
 
 ### 3. Present and compare
 
 Show each design sequentially, then compare in prose on:
 
-- **Depth** (Ousterhout's depth test): small surface hiding significant complexity (good) vs large surface with thin implementation (bad)
+- **Depth** (Ousterhout's depth test): small interface hiding significant complexity (good) vs large interface with thin implementation (bad)
+- **Locality**: whether change, bugs, knowledge, and verification concentrate behind the seam
+- **Leverage**: what callers get per fact they must learn about the interface
 - **Ease of correct use** vs ease of misuse
 - **General-purpose vs specialized**: flexibility vs focus
 - **Implementation efficiency**: does the shape allow efficient internals?