feat(belief-nodes): embed-on-write for Pen + engine-nap (Block D PR 1)#307
Conversation
Wires the first two producers from belief-nodes-embedding-Spec.md §4
through the embed-on-write contract committed in §13.1's
universal-principle amendment. Pen (agent-authored notes) and
engine-nap (context-pressure summaries) now embed via the agent's
co-resident encoder before insert, stamping the four embedding fields
(`embedding`, `embedding_model`, `embedding_dim`, `embedding_task`)
on the belief_nodes document.
Shared helper `weaver-core::embed_write::try_embed_and_stamp_belief_node`
encodes the failure-mode escape from Spec §13.1: on embedder
unavailability or call failure, the helper logs a structured warning
and returns without mutating the doc. The producer proceeds with
insert; the doc lands with embedding: null; sleep stage A's
retroactive pass (Block D PR 3) populates it on the next sleep cycle.
Producers never block on embedder availability.
The helper takes a `producer_label` for telemetry so operators
inspecting logs can distinguish "embedder down" from "producer not yet
embed-on-write" by inspection — the same distinction §13.1 names as
the requirement for null-embedding writes to be operator-visible.
Spec sections covered:
- §4.1 Pen embed-on-write (sync, before insert)
- §4.2 Engine-nap embed-on-write (sync, before insert)
- §4.1.2 Synchronous embed pattern committed in v1
- §13.1 Universal principle + failure-mode escape
- §13.8 v1 direct-call pattern (substrate-gateway refactor tracked
separately; same wire format, only call sites move)
Deferred to subsequent PRs in the Block D bundle:
- §4.3 Preseed batched embed-on-materialize (PR 2)
- §7 Sleep stage A retroactive embed (PR 3)
- §6.4 weaver agent admin backfill-embeddings CLI (PR 3)
Drive-by: one clippy fix (manual_contains) in runtime.rs:1255 that
the same file already triggered.
Tests:
- 4 new unit tests in embed_write covering success, embedder-absent,
embedder-fails, and dim-matches-actual-vector-length.
- All 41 existing notes tests pass (no Pen regression).
- All 24 existing engine/runtime tests pass (engine-nap doc shape
unchanged).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA new ChangesEmbed-on-Write Infrastructure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/weaver-core/src/embed_write.rs`:
- Around line 77-83: The code unconditionally writes JINA_V4_MODEL_NAME into
BELIEF_EMBEDDING_MODEL_FIELD even though embedder.embed_one may use other
models; modify the embed pipeline to record the actual model returned by the
embedder (e.g., change embed_one to return an EmbedResult/tuple containing the
vector and model name or add a model() accessor on the Embedder trait) and set
doc[BELIEF_EMBEDDING_MODEL_FIELD] = json!(model) from that result; update
FixtureEmbedder and any tests to supply the correct model string so stamped
metadata matches the real embedder, or if the Jina-V4-only contract is
intentional, add a clear comment/doc on that constraint instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: de3c5204-7d0b-4877-9f0e-0e9e3ed3f437
📒 Files selected for processing (4)
crates/weaver-core/src/embed_write.rscrates/weaver-core/src/engine/runtime.rscrates/weaver-core/src/lib.rscrates/weaver-core/src/tools/notes.rs
CR flagged that `embedding_model` was stamped with a hardcoded `JINA_V4_MODEL_NAME` constant rather than reflecting the actual backend that produced the vector. Per Spec §2.1 the field is provenance metadata, and per VISION.md §Framework vs. Instantiation the framework's embedder interface is substrate-agnostic — Jina V4 fulfills the contract today, but a future agent profile may run a different encoder. Hardcoding the model name produces false negatives in the Spec's migration-detection filter (`FILTER doc.embedding_model != "<id>"`) for any non-Jina-V4 agent. Fix: - Switch `try_embed_and_stamp_belief_node` from `embed_one()` to `embed()` so we retain `EmbedResult.model` from the backend. - Stamp `result.model` (the actual backend identifier) on `embedding_model` instead of a constant. - Remove `JINA_V4_MODEL_NAME` (only consumer was this helper + its test; no external dependencies). - Handle the protocol-violation case explicitly: a single-text `embed()` returning ≠ 1 embedding triggers a structured warn and degrades to "no embedding stamped" (substrate-write path shouldn't fail loud per §13.1 failure-mode escape; sleep stage A backfills). Tests: - `stamps_four_fields_on_success` updated to assert `"fixture"` (the FixtureEmbedder's reported model name), with a comment pinning the provenance-not-constant contract. - New `stamped_model_name_reflects_backend_identifier` test with an `AlternateModelEmbedder` reporting a non-Jina identifier; asserts the stamped value reflects the backend, not a default. - New `does_not_mutate_doc_when_backend_returns_zero_embeddings` and `does_not_mutate_doc_when_backend_returns_multiple_embeddings` pin the protocol-violation degradation path. 7/7 embed_write tests pass; clippy clean; fmt clean. Related: this is one instance of a broader framework-vs- instantiation discipline (see new memory `feedback_no_hardcoded_instantiation_in_framework`). Tracked audit for additional hardcoded-instantiation drift lives in task #154 — careful, not sweeping. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
First of three PRs implementing the embed-on-write contract from
docs/specs/belief-nodes-embedding-Spec.md§4. Pen (agent-authored notes) and engine-nap (context-pressure summaries) now embed via the agent's co-resident encoder before insert. The two writers share a new helper (weaver-core::embed_write::try_embed_and_stamp_belief_node) that encodes the failure-mode escape from Spec §13.1's universal-principle amendment: embedder unavailable or call fails → log structured warning, leave embedding field absent, sleep stage A retroactively populates on next cycle. Producers never block on embedder availability.Spec coverage: §4.1 (Pen), §4.2 (engine-nap), §4.1.2 sync-embed pattern, §13.1 failure-mode escape, §13.8 v1 direct-call pattern (substrate-gateway refactor tracked separately).
Bundle context: PR 1 of 3. PR 2 ships §4.3 preseed batched embed; PR 3 ships §7 sleep stage A retroactive + §6.4 backfill CLI. The Block D dependency on a working HNSW index (§3) is independent — vectors stamp successfully today; surfacing queries degrade to flat scan until the body-shape rejection on ArangoDB 3.12.4 (
error 10: Expecting type Array) is fixed in a separate PR.Architectural framing
Pen and engine-nap are the two single-doc sync-embed producers. They share a write pattern (one doc → embed once → stamp four fields → insert) and a failure mode (embedder failure → null embedding → sleep retroactive backfill). Bundling them into one PR makes the shared helper land alongside both call sites, so the contract enforcement (single point for the embedding-field shape; single telemetry surface) lives in one reviewable place.
The substrate-gateway refactor per Spec §13.8 is out of scope here. The direct-call producer pattern in this PR produces wire-format-identical documents to what the gateway pattern would produce — only the call sites move when the gateway lands.
Test plan
cargo check --workspacecleancargo clippy -p weaver-core --all-targets -- -D warningscleancargo fmt -p weaver-core -- --checkclean on edited filesembed_write::testscovering success, embedder-absent, embedder-fails, and dim-matches-actual-vector-lengthtools::notestests pass (no Pen regression)engine::runtimetests pass (engine-nap doc-shape contracts unchanged)#[ignore]-gated, require live ArangoDB + embedder) — to run before fresh-agent baseline experiment in PR-E3: three-zone context budget + nap trigger #147Drive-by
One clippy fix at
runtime.rs:1255(manual_containslint). Same file as the engine-nap edit; trivial single-line correctness improvement.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Robustness
Tests