audit: embedder swap — Jina V4 → Qwen3-Embedding-0.6B-Q8 GGUF (PR 1 of 2)#317
Conversation
…f 2) Planning artifact for the embedder swap discussed 2026-05-11/12. No code changes; ships the audit + decisions so PR 2 (implementation) has a clear scope and a single review surface. Decisions locked: - Carrier: Qwen3-Embedding-0.6B-Q8_0 GGUF (~635MB, 1024 dim, 32K context) - Serving: llama.cpp via llama-cpp-sys-2 (same engine as the decoder; no Python in runtime; no new dependency) - Candle fork: unchanged (Jina V4 backend remains one valid profile option per per-profile pluggability commit) - Default context size: 32K native (don't truncate; preserves late- chunking for future workloads) - GGUF install path: /opt/weaver/models/qwen3-embedding-0.6b/ - GPU placement: same-GPU per agent default, configurable per spu yaml Framework leaks identified (will be fixed in PR 2): - belief.rs:114-120 BELIEF_EMBEDDING_INDEX const with dimension: 2048 hardcoded — must become builder constructed from agent spu config - harness.rs:207-250 test fixtures asserting model=="jina-v4" and dim==2048 — must assert against configured embedder, not literals - toml_edit.rs / serve.rs config writers emitting Jina V4 defaults - runtime_schema.rs / episode.rs schema-side 2048 references - A handful of test fixtures across crates pinning Jina specifics Verification status: llama.cpp + Qwen3-Embedding-0.6B-Q8 GGUF compatibility confirmed via llama-cpp-python smoke test 2026-05-12 morning — 1024-dim embeddings, last-token pooling (GGUF default is correct), retrieval ranking correct on a HeroBench fixture query (cosine 0.66 for the relevant passage vs 0.18 for an irrelevant one). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds an audit document planning the embedder backend swap from Jina V4 to Qwen3-Embedding-0.6B (GGUF): records architecture choices, inventories Jina/2048 references, lists framework leaks to fix, describes PR 2 implementation steps (client, YAML, docs), locks decisions, and defines success criteria. ChangesEmbedder-swap audit plan (Jina V4 to Qwen3-GGUF)
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/audit/embedder-swap-2026-05-12.md`:
- Line 117: There is a conflict between the two default context_size values:
change the document so the `context_size` is consistent for the new default
template (`Qwen3-Embedding-0.6B-Q8_0.gguf`, `model_id:
"qwen3-embedding-0.6b-gguf"`) and the locked default YAML; either set both
occurrences to the same numeric value (e.g., 8192 or 32768) or explicitly label
one as “operational default: context_size: 8192” and the other as “native
capability: context_size: 32768” with clear wording so readers know which is the
runtime default used by PR 2 and which is the model’s native max.
- Around line 27-33: The fenced code block in
docs/audit/embedder-swap-2026-05-12.md (the block showing the JinaV4 and metrics
lines) is missing a language identifier; update that triple-backtick fence to
include a language such as "text" (e.g., ```text) so markdownlint rule MD040 is
satisfied, leaving the block contents unchanged and preserving alignment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2fc9f999-56ac-4d36-a7c1-32dd2b1360f0
📒 Files selected for processing (1)
docs/audit/embedder-swap-2026-05-12.md
Two real findings: 1. §7 context_size was stale at 8192 — I forgot to back-propagate the §10.1 decision (32K native, don't truncate) when locking the decisions section. Updated §7 to reflect 32768 and added a cross- reference to §10.1 so the reason isn't lost on future readers. 2. The reference-count fence at §3 lacked a language identifier; markdownlint MD040 wants one. Changed to ```text. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/audit/embedder-swap-2026-05-12.md`:
- Line 12: The sentence conflates GGUF artifact file size with runtime GPU
memory; update the phrasing in the paragraph mentioning "Jina V4" and
"Qwen3-Embedding-0.6B-Q8_0" so it explicitly separates the GGUF weights
footprint (~635MB) from expected runtime VRAM (which is workload- and
backend-dependent), e.g., state that "~635MB refers to the GGUF artifact size,
not total runtime GPU memory, which varies with batch size, sequence length, and
allocator/engine," and keep the note about preserving 32K context and alignment
with the Qwen3-Coder decoder and the substrate-pluggability reference
(`memory-substrate-Spec.md` §3.5).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 94f3fb4f-a7f7-4d1e-b95a-aa002b1062b8
📒 Files selected for processing (1)
docs/audit/embedder-swap-2026-05-12.md
CR flagged that "~635MB VRAM" conflated GGUF artifact size with total runtime GPU memory. The artifact size is weights-on-disk; actual VRAM also includes KV cache (significant with 32K context), intermediate activations, and allocator overhead. Rewrote the motivation sentence to make the distinction explicit and noted that empirical runtime VRAM will be measured during PR 2's smoke test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… leak (PR 2 of N) (#318) * refactor(belief): parameterize embedding-index dim — remove framework leak Removes the framework-side hardcoded 2048 from weaver-database/src/graph/belief.rs (the highest-priority framework leak identified in the embedder-swap audit, PR #317). The index definition's static metadata (name, field, metric, n_lists) stays as named consts; the per-agent `dimension` is supplied at construction time via a new `belief_embedding_index_for_dim(dim) -> BeliefEmbeddingIndexDef` const-fn builder. Per the substrate-pluggability commitment in `memory-substrate-Spec.md` §3.5 and the per-profile mapping in `project_embedder_per_profile`, different agents use different embedders (Jina V4 → 2048, Qwen3-Embedding-0.6B → 1024); each agent's index must match the embedder it's bound to. A single framework const can't capture that. Changes: - `weaver-database/src/graph/belief.rs`: - New consts `BELIEF_EMBEDDING_INDEX_NAME`, `_METRIC`, `_N_LISTS` carry the invariant metadata. - New const-fn `belief_embedding_index_for_dim(dim) -> def`. - Removed the `BELIEF_EMBEDDING_INDEX` const (the actual framework leak — that const baked dim 2048 into the framework). - Doc comments updated to reflect per-agent dim provenance. - Test refactored to exercise the builder with multiple dims, proving the builder propagates dim verbatim and metadata stays invariant across calls. - `weaver-demo/src/herobench/belief.rs`: - New `HEROBENCH_DEFAULT_EMBEDDING_DIM = 2048` named const, marked transitional in its doc comment. Demo-side carries the bridge default during the embedder-swap migration; framework side stays clean. - `validate_belief_embedding_schema()` (returned metric+dim) split into `validate_belief_embedding_metric()` (metric only). Dimension is now threaded explicitly from the named const, not buried in a framework return value. - `create_belief_embedding_index` uses the new `BELIEF_EMBEDDING_INDEX_N_LISTS` const directly. - `weaver-demo/tests/herobench_integration.rs`: - Drive-by: fix pre-existing clippy nit (`manual_is_multiple_of`) that the upgraded clippy version started flagging. Unrelated to the embedder swap; happened to surface in the same crate's test build. 12 unit tests in `weaver-database::graph::belief` pass; clippy + fmt clean across weaver-database + weaver-demo. Next commits in this PR: - New LlamaCppEmbedderClient in weaver-spu/src/encoder/llama_cpp_client.rs - Backend dispatch in server.rs::load_per_agent_embedder - Default agent YAML template referencing Qwen3-Embedding-0.6B-Q8_0 - GGUF file move from HF cache to /opt/weaver/models/ - Decision log entries in specs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * review: rename validate_ → parse_belief_embedding_metric Per PR #318 CR review: the function parses a string into the VectorMetric enum (it's the only consumer that matters; an unknown metric string is the only failure path). "validate" was a misnomer inherited from the pre-split `validate_belief_embedding_schema`; "parse_" is what it actually does. Body and return type unchanged; call site updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * review: drop redundant BELIEF_NODES_COLLECTION import in herobench belief.rs Per PR #318 CR review: the function-local `use` in create_belief_embedding_index was pulling `BELIEF_NODES_COLLECTION` from `weaver_database::graph::belief`, shadowing the file-local `pub const BELIEF_NODES_COLLECTION` at line ~906. Both happen to have the same value ("belief_nodes"), so no runtime divergence, but the shadow is sloppy. Removed `BELIEF_NODES_COLLECTION` from the import list. Comment inline explains why so the next reader doesn't reintroduce it. The file-local const is now the single source for this function. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: r3d91ll <r3d91ll@bucy-medrano.me> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Planning artifact for the embedder swap discussed last night and this morning. No code changes — just the audit + decisions so PR 2 (implementation) has a clear scope and a single review surface.
Decisions locked
Framework leaks identified (fix in PR 2)
The `weaver-spu/src/encoder/` references (77 hits) are legitimate — that's the embedder implementation layer where Jina V4 lives as one valid backend. They stay.
Verification status
llama.cpp + Qwen3-Embedding-0.6B-Q8 GGUF compatibility confirmed via `llama-cpp-python` smoke test this morning:
What ships in PR 2 (next)
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit