audit: embedder swap — Jina V4 → Qwen3-Embedding-0.6B-Q8 GGUF (PR 1 of 2) by toddwbucy · Pull Request #317 · toddwbucy/WeaverTools

toddwbucy · 2026-05-12T13:10:51Z

Summary

Planning artifact for the embedder swap discussed last night and this morning. No code changes — just the audit + decisions so PR 2 (implementation) has a clear scope and a single review surface.

Decisions locked

Decision	Choice
Carrier	Qwen3-Embedding-0.6B-Q8_0 GGUF (~635MB, 1024 dim, 32K context)
Serving stack	llama.cpp via existing `llama-cpp-sys-2` (same engine as Qwen3-Coder decoder)
Candle fork	Unchanged — Jina V4 backend remains valid per per-profile pluggability
Default context size	32K native — don't truncate; preserves late-chunking for later workloads
GGUF install path	`/opt/weaver/models/qwen3-embedding-0.6b/`
GPU placement	Same-GPU per agent default, configurable via `spu.encoder.gpu` / `spu.decoder.gpu`

Framework leaks identified (fix in PR 2)

Site	Issue
`belief.rs:114-120`	`BELIEF_EMBEDDING_INDEX` const has `dimension: 2048` hardcoded → must become builder-constructed from agent's spu config
`harness.rs:207-250`	Test fixtures assert `model == "jina-v4"` and `dim == 2048` → must assert against configured embedder
`toml_edit.rs`, `serve.rs`	Config writers emit Jina V4 defaults
`runtime_schema.rs`, `episode.rs`	Schema-side 2048 references
Several test fixtures across crates	Pin Jina specifics

The `weaver-spu/src/encoder/` references (77 hits) are legitimate — that's the embedder implementation layer where Jina V4 lives as one valid backend. They stay.

Verification status

llama.cpp + Qwen3-Embedding-0.6B-Q8 GGUF compatibility confirmed via `llama-cpp-python` smoke test this morning:

✓ 1024-dim embeddings
✓ Last-token pooling (GGUF default is correct for Qwen3-Embedding)
✓ Retrieval ranking correct on a HeroBench fixture query (cosine 0.66 for relevant vs 0.18 for irrelevant)

What ships in PR 2 (next)

`LlamaCppEmbedderClient` in `weaver-spu/src/encoder/llama_cpp_client.rs` implementing `Embedder` trait
Backend dispatch in daemon load handler (Jina V4 → candle; Qwen3-Embedding → llama.cpp)
`BELIEF_EMBEDDING_INDEX` const → builder function parameterized on dim
Test fixture updates
Default agent YAML template referencing Qwen3-Embedding-0.6B-Q8_0
Decision log entries in `belief-nodes-embedding-Spec.md` + `memory-substrate-Spec.md`
GGUF file move `/opt/weaver/huggingface/hub/gguf-models/qwen3-embedding-0.6B/` → `/opt/weaver/models/qwen3-embedding-0.6b/`

Test plan

Audit doc reviewable as planning artifact
PR 2 will exercise: `weaver agent create test_agent_qwen3` → auto-load fires (PR feat(agent-lifecycle): create-time encoder load + preseed embed backfill (PR B of Block D task 5) #316) → new embedder backend constructs → preseed embeds via Qwen3-Embedding-0.6B → HeroBench session produces `autonomic_signal_*` events with 1024-dim vectors in trace JSONL

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added a comprehensive audit documenting the planned embedder backend swap to Qwen3 embeddings, migration scope, constraints, and an ordered implementation checklist.
- Includes architecture choices, migration steps for embeddings and indexes, agent template defaults and test updates, documentation touchpoints, prioritized items to remove hardcoded assumptions, locked decisions, and success criteria.

…f 2) Planning artifact for the embedder swap discussed 2026-05-11/12. No code changes; ships the audit + decisions so PR 2 (implementation) has a clear scope and a single review surface. Decisions locked: - Carrier: Qwen3-Embedding-0.6B-Q8_0 GGUF (~635MB, 1024 dim, 32K context) - Serving: llama.cpp via llama-cpp-sys-2 (same engine as the decoder; no Python in runtime; no new dependency) - Candle fork: unchanged (Jina V4 backend remains one valid profile option per per-profile pluggability commit) - Default context size: 32K native (don't truncate; preserves late- chunking for future workloads) - GGUF install path: /opt/weaver/models/qwen3-embedding-0.6b/ - GPU placement: same-GPU per agent default, configurable per spu yaml Framework leaks identified (will be fixed in PR 2): - belief.rs:114-120 BELIEF_EMBEDDING_INDEX const with dimension: 2048 hardcoded — must become builder constructed from agent spu config - harness.rs:207-250 test fixtures asserting model=="jina-v4" and dim==2048 — must assert against configured embedder, not literals - toml_edit.rs / serve.rs config writers emitting Jina V4 defaults - runtime_schema.rs / episode.rs schema-side 2048 references - A handful of test fixtures across crates pinning Jina specifics Verification status: llama.cpp + Qwen3-Embedding-0.6B-Q8 GGUF compatibility confirmed via llama-cpp-python smoke test 2026-05-12 morning — 1024-dim embeddings, last-token pooling (GGUF default is correct), retrieval ranking correct on a HeroBench fixture query (cosine 0.66 for the relevant passage vs 0.18 for an irrelevant one). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-12T13:11:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 06a96ec3-7d9f-468f-b9ef-85af4b815ca9

📥 Commits

Reviewing files that changed from the base of the PR and between 2ddf52a and 81185ce.

📒 Files selected for processing (1)

docs/audit/embedder-swap-2026-05-12.md

📝 Walkthrough

Walkthrough

Adds an audit document planning the embedder backend swap from Jina V4 to Qwen3-Embedding-0.6B (GGUF): records architecture choices, inventories Jina/2048 references, lists framework leaks to fix, describes PR 2 implementation steps (client, YAML, docs), locks decisions, and defines success criteria.

Changes

Embedder-swap audit plan (Jina V4 to Qwen3-GGUF)

Layer / File(s)	Summary
Audit overview and architectural decisions `docs/audit/embedder-swap-2026-05-12.md`	Header identifying the audit, date, PR sequencing, and core architectural choices including llama.cpp/GGUF stack selection, profile-driven embedding-dimension/index construction, Candle fork stance, and dev-DB rebuild requirement.
Reference inventory and constraint boundaries `docs/audit/embedder-swap-2026-05-12.md`	Reference-count summary tracking Jina and 2048-dim occurrences; defines "legitimate references" permitted within the embedder implementation layer; lists prioritized "framework leaks" requiring parameterization or updates.
PR 2 implementation plan and roadmap `docs/audit/embedder-swap-2026-05-12.md`	Describes embedder client construction (add `LlamaCppEmbedderClient` alongside existing candle-based client), enumerates dispatcher call sites and agent YAML template updates (new Qwen3 default and test agent), and documents related docs and deferred items.
Locked decisions and success criteria `docs/audit/embedder-swap-2026-05-12.md`	Records locked choices (native 32K context, canonical GGUF install path, GPU placement honoring agent YAML), the ordered PR 2 implementation sequence, and explicit success criteria (tests, provisioning/retrieval behavior, removal of hardcoded Jina/2048 outside embedder layer, docs alignment, decision-log entry).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

toddwbucy/WeaverTools#117: Adds a gguf-based llama.cpp embedding module (text and vision) that implements the gguf embedder functionality referenced by this audit.
toddwbucy/WeaverTools#298: Trait-typed, opt-in in-process embedder work that overlaps backend wiring and construction call sites inventoried by the audit.
toddwbucy/WeaverTools#293: Adds a JinaV4-based embedder producing 2048-d vectors, directly intersecting the audit's inventory of Jina/2048 usages.

Poem

🐰 In quiet docs I nibble a chart,
Jina to Qwen3 — a tidy new start,
I count each leak, pin every hard code,
Pack GGUF paths down the embedder road,
Hop, test, and log — a tidy migration art.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: an audit document planning an embedder swap from Jina V4 to Qwen3-Embedding-0.6B-Q8 GGUF, and is specific, concise, and directly related to the PR's primary objective.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/audit/embedder-swap-2026-05-12.md`:
- Line 117: There is a conflict between the two default context_size values:
change the document so the `context_size` is consistent for the new default
template (`Qwen3-Embedding-0.6B-Q8_0.gguf`, `model_id:
"qwen3-embedding-0.6b-gguf"`) and the locked default YAML; either set both
occurrences to the same numeric value (e.g., 8192 or 32768) or explicitly label
one as “operational default: context_size: 8192” and the other as “native
capability: context_size: 32768” with clear wording so readers know which is the
runtime default used by PR 2 and which is the model’s native max.
- Around line 27-33: The fenced code block in
docs/audit/embedder-swap-2026-05-12.md (the block showing the JinaV4 and metrics
lines) is missing a language identifier; update that triple-backtick fence to
include a language such as "text" (e.g., ```text) so markdownlint rule MD040 is
satisfied, leaving the block contents unchanged and preserving alignment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2fc9f999-56ac-4d36-a7c1-32dd2b1360f0

📥 Commits

Reviewing files that changed from the base of the PR and between 150ec39 and cdcdc31.

📒 Files selected for processing (1)

docs/audit/embedder-swap-2026-05-12.md

Two real findings: 1. §7 context_size was stale at 8192 — I forgot to back-propagate the §10.1 decision (32K native, don't truncate) when locking the decisions section. Updated §7 to reflect 32768 and added a cross- reference to §10.1 so the reason isn't lost on future readers. 2. The reference-count fence at §3 lacked a language identifier; markdownlint MD040 wants one. Changed to ```text. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/audit/embedder-swap-2026-05-12.md`:
- Line 12: The sentence conflates GGUF artifact file size with runtime GPU
memory; update the phrasing in the paragraph mentioning "Jina V4" and
"Qwen3-Embedding-0.6B-Q8_0" so it explicitly separates the GGUF weights
footprint (~635MB) from expected runtime VRAM (which is workload- and
backend-dependent), e.g., state that "~635MB refers to the GGUF artifact size,
not total runtime GPU memory, which varies with batch size, sequence length, and
allocator/engine," and keep the note about preserving 32K context and alignment
with the Qwen3-Coder decoder and the substrate-pluggability reference
(`memory-substrate-Spec.md` §3.5).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 94f3fb4f-a7f7-4d1e-b95a-aa002b1062b8

📥 Commits

Reviewing files that changed from the base of the PR and between cdcdc31 and 2ddf52a.

📒 Files selected for processing (1)

docs/audit/embedder-swap-2026-05-12.md

CR flagged that "~635MB VRAM" conflated GGUF artifact size with total runtime GPU memory. The artifact size is weights-on-disk; actual VRAM also includes KV cache (significant with 32K context), intermediate activations, and allocator overhead. Rewrote the motivation sentence to make the distinction explicit and noted that empirical runtime VRAM will be measured during PR 2's smoke test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… leak (PR 2 of N) (#318) * refactor(belief): parameterize embedding-index dim — remove framework leak Removes the framework-side hardcoded 2048 from weaver-database/src/graph/belief.rs (the highest-priority framework leak identified in the embedder-swap audit, PR #317). The index definition's static metadata (name, field, metric, n_lists) stays as named consts; the per-agent `dimension` is supplied at construction time via a new `belief_embedding_index_for_dim(dim) -> BeliefEmbeddingIndexDef` const-fn builder. Per the substrate-pluggability commitment in `memory-substrate-Spec.md` §3.5 and the per-profile mapping in `project_embedder_per_profile`, different agents use different embedders (Jina V4 → 2048, Qwen3-Embedding-0.6B → 1024); each agent's index must match the embedder it's bound to. A single framework const can't capture that. Changes: - `weaver-database/src/graph/belief.rs`: - New consts `BELIEF_EMBEDDING_INDEX_NAME`, `_METRIC`, `_N_LISTS` carry the invariant metadata. - New const-fn `belief_embedding_index_for_dim(dim) -> def`. - Removed the `BELIEF_EMBEDDING_INDEX` const (the actual framework leak — that const baked dim 2048 into the framework). - Doc comments updated to reflect per-agent dim provenance. - Test refactored to exercise the builder with multiple dims, proving the builder propagates dim verbatim and metadata stays invariant across calls. - `weaver-demo/src/herobench/belief.rs`: - New `HEROBENCH_DEFAULT_EMBEDDING_DIM = 2048` named const, marked transitional in its doc comment. Demo-side carries the bridge default during the embedder-swap migration; framework side stays clean. - `validate_belief_embedding_schema()` (returned metric+dim) split into `validate_belief_embedding_metric()` (metric only). Dimension is now threaded explicitly from the named const, not buried in a framework return value. - `create_belief_embedding_index` uses the new `BELIEF_EMBEDDING_INDEX_N_LISTS` const directly. - `weaver-demo/tests/herobench_integration.rs`: - Drive-by: fix pre-existing clippy nit (`manual_is_multiple_of`) that the upgraded clippy version started flagging. Unrelated to the embedder swap; happened to surface in the same crate's test build. 12 unit tests in `weaver-database::graph::belief` pass; clippy + fmt clean across weaver-database + weaver-demo. Next commits in this PR: - New LlamaCppEmbedderClient in weaver-spu/src/encoder/llama_cpp_client.rs - Backend dispatch in server.rs::load_per_agent_embedder - Default agent YAML template referencing Qwen3-Embedding-0.6B-Q8_0 - GGUF file move from HF cache to /opt/weaver/models/ - Decision log entries in specs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * review: rename validate_ → parse_belief_embedding_metric Per PR #318 CR review: the function parses a string into the VectorMetric enum (it's the only consumer that matters; an unknown metric string is the only failure path). "validate" was a misnomer inherited from the pre-split `validate_belief_embedding_schema`; "parse_" is what it actually does. Body and return type unchanged; call site updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * review: drop redundant BELIEF_NODES_COLLECTION import in herobench belief.rs Per PR #318 CR review: the function-local `use` in create_belief_embedding_index was pulling `BELIEF_NODES_COLLECTION` from `weaver_database::graph::belief`, shadowing the file-local `pub const BELIEF_NODES_COLLECTION` at line ~906. Both happen to have the same value ("belief_nodes"), so no runtime divergence, but the shadow is sloppy. Removed `BELIEF_NODES_COLLECTION` from the import list. Comment inline explains why so the next reader doesn't reintroduce it. The file-local const is now the single source for this function. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: r3d91ll <r3d91ll@bucy-medrano.me> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread docs/audit/embedder-swap-2026-05-12.md Outdated

Comment thread docs/audit/embedder-swap-2026-05-12.md Outdated

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread docs/audit/embedder-swap-2026-05-12.md Outdated

toddwbucy merged commit f7c4689 into main May 12, 2026
1 check passed

toddwbucy deleted the audit/embedder-swap-qwen3 branch May 12, 2026 14:10

toddwbucy mentioned this pull request May 12, 2026

refactor(belief): parameterize embedding-index dim — remove framework leak (PR 2 of N) #318

Merged

4 tasks

toddwbucy mentioned this pull request May 12, 2026

feat(spu): LlamaCppEmbedderClient — Qwen3-Embedding GGUF backend (PR 3 of N) #319

Merged

4 tasks

coderabbitai Bot mentioned this pull request May 12, 2026

feat(spu): backend dispatch + Qwen3-Embedding agent template (PR 4 of N) #320

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audit: embedder swap — Jina V4 → Qwen3-Embedding-0.6B-Q8 GGUF (PR 1 of 2)#317

audit: embedder swap — Jina V4 → Qwen3-Embedding-0.6B-Q8 GGUF (PR 1 of 2)#317
toddwbucy merged 3 commits into
mainfrom
audit/embedder-swap-qwen3

toddwbucy commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

toddwbucy commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Decisions locked

Framework leaks identified (fix in PR 2)

Verification status

What ships in PR 2 (next)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

toddwbucy commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading