Skip to content

refactor(spu): delete weaver-inference + weaver-embedding shell crates (PR-0.5.E)#278

Merged
toddwbucy merged 2 commits into
mainfrom
refactor/spu-cleanup-old-crates
May 5, 2026
Merged

refactor(spu): delete weaver-inference + weaver-embedding shell crates (PR-0.5.E)#278
toddwbucy merged 2 commits into
mainfrom
refactor/spu-cleanup-old-crates

Conversation

@toddwbucy
Copy link
Copy Markdown
Owner

@toddwbucy toddwbucy commented May 5, 2026

Summary

Per docs/specs/weaver-spu-Spec.md §10 PR-0.5.E. Final Phase 0.5 PR. Removes the deprecated re-export shells and consolidates the workspace from 9 → 7 crates per spec §6.

What's deleted

  • crates/weaver-inference/ (entire directory; was a 17-export deprecated shell after PR-0.5.C)
  • crates/weaver-embedding/ (entire directory; was a deprecated shell after PR-0.5.B + PR-0.5.D)

Workspace Cargo.toml

Members list contracted from 9 → 7. Workspace deps removed (replaced with a comment pointing at the consolidation history).

Consumer-side import rewrites (bulk sed)

From To
weaver_inference::backend weaver_spu::decoder::llama_cpp_backend
weaver_inference::{engine,family,gguf,multi_model,model,model_profile,config,download,runtime_config,startup,suggest,server} weaver_spu::decoder::*
weaver_inference::{vram,nvlink,probe,gpu_orchestrator,gpu} weaver_spu::core::*
weaver_embedding::{Embedder,EmbedderInfo} weaver_core::embedder::*
weaver_embedding::{late_chunk,LateChunkConfig,LateChunkResult,late_chunk_embeddings} weaver_spu::encoder::*
weaver_embedding::pin weaver_spu::core::pin
weaver_embedding::grpc_client weaver_spu::encoder::grpc_client_legacy
weaver_embedding::gguf_backend weaver_spu::encoder::gguf_backend
weaver_embedding::proto weaver_spu::proto

Cargo.toml updates

  • weaver-interface: dropped optional weaver-inference + direct weaver-embedding deps; switched to weaver-spu dep with inference feature now forwarding weaver-spu/cuda + weaver-spu/gguf.
  • weaver-demo: dropped weaver-embedding dep, added weaver-spu.

Out-of-scope clippy suppression

weaver-spu/src/decoder/server/prompt.rs + weaver-demo/src/bin/embedder_latency.rs gain #![allow(elided_lifetimes_in_paths)] at file scope. The hidden-lifetime patterns are pre-existing in moved code; fixing them is a future cleanup PR. One bounded clippy fix in embedder_latency.rs (.iter().nth(2).get(2)) was also made inline.

Test plan

  • cargo build --workspace clean
  • cargo test --workspace clean — all tests pass (588 weaver-core, 110 weaver-database, plus moved weaver-spu integration tests grpc_client_legacy and mutex_resilience)
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • Workspace member count = 7 (verified via cargo metadata): weaver-analysis, weaver-core, weaver-database, weaver-demo, weaver-interface, weaver-spu, weaver-trace

Sequencing

  • Dependencies: PR-0.5.A, PR-0.5.B, PR-0.5.C, PR-0.5.D (all merged).
  • Phase 0.5 (crate consolidation) is complete. Phase 1 (encoder candle port) commits next per spec §10 sequencing.
  • Companion upstream PR: none.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added an embedder latency benchmark tool reporting per-size/concurrency P50/P95/P99, max and mean metrics.
  • Refactor

    • Consolidated embedding/inference functionality into the SPU/core stack, removed legacy compatibility layers, and simplified workspace membership and feature wiring.
  • Documentation

    • Updated docs and comments to reference the consolidated crate/module layout and new import paths.

…s (PR-0.5.E)

Per docs/specs/weaver-spu-Spec.md §10 PR-0.5.E. Final Phase 0.5 PR.
Removes the deprecated re-export shells and consolidates the
workspace from 9 → 7 crates per spec §6.

What's deleted:
- crates/weaver-inference/ (entire directory; was a 17-export
  deprecated shell after PR-0.5.C)
- crates/weaver-embedding/ (entire directory; was a deprecated
  shell after PR-0.5.B + PR-0.5.D)

Workspace Cargo.toml:
- Members list contracted from 9 to 7.
- weaver-embedding + weaver-inference workspace deps removed
  (replaced by a comment pointing at the consolidation history).

Consumer-side import rewrites (bulk sed across non-shell crates):
- weaver_inference::backend           → weaver_spu::decoder::llama_cpp_backend
- weaver_inference::{engine,family,gguf,multi_model,model,
                     model_profile,config,download,runtime_config,
                     startup,suggest,server}
                                      → weaver_spu::decoder::*
- weaver_inference::{vram,nvlink,probe,gpu_orchestrator,gpu}
                                      → weaver_spu::core::*
- weaver_embedding::{Embedder,EmbedderInfo}
                                      → weaver_core::embedder::*
- weaver_embedding::{late_chunk,LateChunkConfig,LateChunkResult,
                     late_chunk_embeddings}
                                      → weaver_spu::encoder::*
- weaver_embedding::pin               → weaver_spu::core::pin
- weaver_embedding::grpc_client       → weaver_spu::encoder::grpc_client_legacy
- weaver_embedding::gguf_backend      → weaver_spu::encoder::gguf_backend
- weaver_embedding::proto             → weaver_spu::proto

Cargo.toml updates:
- weaver-interface: dropped optional `weaver-inference` + direct
  `weaver-embedding` deps; switched to `weaver-spu` dep with
  `inference` feature now forwarding `weaver-spu/cuda` +
  `weaver-spu/gguf`.
- weaver-demo: dropped `weaver-embedding` dep, added `weaver-spu`.

Out-of-scope clippy suppression (pre-existing in moved code):
- crates/weaver-spu/src/decoder/server/prompt.rs +
  crates/weaver-demo/src/bin/embedder_latency.rs gain
  `#![allow(elided_lifetimes_in_paths)]` at file scope. The
  hidden-lifetime patterns date back to weaver-inference's pre-deny
  state; fixing them is a future cleanup PR. One bounded clippy fix
  in embedder_latency.rs (.iter().nth(2) → .get(2)) was also made
  inline since it was a one-line change.

Acceptance criteria from spec §10 PR-0.5.E:
- cargo build --workspace clean ✓
- cargo test --workspace clean (all tests pass — 588 weaver-core,
  110 weaver-database, plus the moved weaver-spu integration tests
  for grpc_client_legacy and mutex_resilience) ✓
- cargo clippy --workspace --all-targets -- -D warnings clean ✓
- Workspace member count = 7 (verified via cargo metadata):
  weaver-analysis, weaver-core, weaver-database, weaver-demo,
  weaver-interface, weaver-spu, weaver-trace ✓

Phase 0.5 (crate consolidation) is complete. Phase 1 (encoder
candle port) commits next per spec §10 sequencing.

Dependencies: PR-0.5.A, PR-0.5.B, PR-0.5.C, PR-0.5.D (all merged).
Companion upstream PR: none.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5fc4da32-6e45-4a1c-89a2-2bba75eed1fc

📥 Commits

Reviewing files that changed from the base of the PR and between 92477c1 and e500054.

📒 Files selected for processing (3)
  • crates/weaver-core/src/embedder.rs
  • crates/weaver-interface/src/lib.rs
  • crates/weaver-spu/src/encoder/gguf_backend.rs

📝 Walkthrough

Walkthrough

This PR removes the deprecated weaver-embedding and weaver-inference crate shells, folds consumers to weaver_spu, updates workspace members/dependencies, migrates import paths across crates, and adds a new embedder latency benchmark binary.

Changes

Workspace & Crate Consolidation

Layer / File(s) Summary
Workspace Configuration
Cargo.toml
Removed crates/weaver-inference and crates/weaver-embedding from [workspace] members; deleted workspace dependency entries for those crates; added weaver-spu = { path = "crates/weaver-spu", default-features = false } and explanatory comments.
Remove Deprecated Shells
crates/weaver-embedding/Cargo.toml, crates/weaver-embedding/src/lib.rs, crates/weaver-inference/Cargo.toml, crates/weaver-inference/src/lib.rs, crates/weaver-inference/kernels/transformer.cu, crates/weaver-inference/examples/inference.toml
Deleted manifest contents and deprecated re-export shells; removed crate-level re-exports (embedder/inference facades), features that forwarded to weaver-spu, example config, and CUDA kernel source and extern FFI launchers.
Core Import/Type Migration
crates/weaver-core/src/embedder.rs, crates/weaver-database/src/{lib.rs,chunking/mod.rs}, crates/weaver-interface/src/{harness.rs,lib.rs,model.rs,serve.rs,toml_edit.rs}, crates/weaver-demo/{Cargo.toml,src/herobench/*,tests/*}, crates/weaver-spu/src/encoder/gguf_backend.rs
Replaced references/imports from weaver_embedding::* and weaver_inference::* with weaver_spu::encoder::grpc_client_legacy, weaver_spu::core::{pin,probe}, weaver_spu::decoder::*, and weaver_spu::proto::embedding; updated feature wiring in Cargo manifests and a few function signatures/types to the new weaver_spu types.
New Benchmark Binary
crates/weaver-demo/src/bin/embedder_latency.rs
Added a Tokio-based latency characterization binary that connects to an EmbeddingClient over a Unix socket, warms up the embedder, sweeps payload sizes and concurrency (1/2/4), collects round-trip and server-compute samples, computes P50/P95/P99, transport overhead, and prints a Markdown results table plus an OOM-speed reading.
Misc Docs / Lints
crates/weaver-spu/src/decoder/server/prompt.rs, various doc comments
Added #![allow(elided_lifetimes_in_paths)] to prompt.rs and updated module-level doc comments across multiple files to point at the relocated weaver_spu paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped through crates and tidy threads,

shells tucked away where old code led.
Paths now point to spu with cheer,
A latency tool hops proudly near.
Small rabbit grin — the tree looks clear.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main refactoring: deletion of two deprecated shell crates (weaver-inference and weaver-embedding) from the workspace.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/weaver-spu/src/encoder/gguf_backend.rs (1)

237-247: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale test-module comment still describes removed crates

Lines 237-245 still document weaver_inference/weaver_embedding dependency flow, but Line 247 now imports from weaver_spu::decoder::gguf. Please update or remove that block to avoid misleading future edits.

Suggested comment update
-    // Tests load a real GGUF via `weaver_inference`'s decoder
-    // model-loader because that's where `init_backend` / `load_model`
-    // / `GgufModelParams` live (per the post-Phase-0 ontology:
-    // model-runtime primitives stay in `weaver_inference`;
-    // embedding-pipeline composition lives here in
-    // `weaver_embedding`). `weaver-inference` is therefore a
-    // dev-dependency of `weaver-embedding`. A later PR (Phase 1
-    // — the `EmbedderClient` impl) makes it a regular dependency
-    // when the production embedder code calls `load_model` itself.
+    // Tests load a real GGUF via `weaver_spu::decoder::gguf`,
+    // which exposes `init_backend`, `load_model`, and `GgufModelParams`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/weaver-spu/src/encoder/gguf_backend.rs` around lines 237 - 247, The
block comment describing `weaver_inference`/`weaver_embedding` dependency flow
is stale now that the code imports directly from weaver_spu (see the `use
std::sync::Mutex;` and `use weaver_spu::decoder::gguf::{GgufModelParams,
init_backend, load_model};` lines); remove or replace the paragraph with a
brief, accurate note explaining current dependency/usage (or delete it entirely)
so it no longer references removed crates
(`weaver_inference`/`weaver_embedding`) and reflects that `GgufModelParams`,
`init_backend`, and `load_model` are imported from `weaver_spu::decoder::gguf`.
crates/weaver-interface/src/lib.rs (1)

9-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unify the pin-path migration comment block.

Line 13 points to weaver_spu::core::pin, but Lines 9-12 still reference weaver-embedding::pin. Keep one canonical path to avoid confusion.

💡 Suggested comment cleanup
-// `embedder_pin` moved to `weaver-embedding::pin` per
-// `embedder-oxidization-Spec.md` §5.1 (issue `#166` / sprint
-// Block A.4). Cohort-pin is an embedding-substrate concern, not
-// a CLI concern; it lives alongside the loader paths it guards.
-// External consumers import from `weaver_spu::core::pin` now.
+// `embedder_pin` moved to `weaver_spu::core::pin` per
+// `embedder-oxidization-Spec.md` §5.1 (issue `#166` / sprint
+// Block A.4). Cohort-pin is an embedding-substrate concern, not
+// a CLI concern; it lives alongside the loader paths it guards.
+// External consumers import from `weaver_spu::core::pin`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/weaver-interface/src/lib.rs` around lines 9 - 13, Update the migration
comment to use one canonical import path: replace the reference to
weaver-embedding::pin with the canonical weaver_spu::core::pin mentioned on line
13 and remove the duplicate/contradictory phrasing around embedder_pin; keep the
note that embedder_pin moved per the Spec and that cohort-pin is an
embedding-substrate concern, but reference only weaver_spu::core::pin
(mentioning embedder_pin, weaver-embedding::pin, and weaver_spu::core::pin to
locate the text).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-core/src/embedder.rs`:
- Around line 154-156: Update the doc comments in this file to remove references
to the removed weaver-embedding crate and point them to the new symbol
weaver_spu::encoder::grpc_client_legacy; specifically search for any lingering
"weaver-embedding" paths (noted around the earlier comments at the previous Line
10, Line 42, Line 110) and replace them with
weaver_spu::encoder::grpc_client_legacy so the two bulleted items (and any other
mentions) consistently refer to the new gRPC client symbol rather than the
deleted crate.

---

Outside diff comments:
In `@crates/weaver-interface/src/lib.rs`:
- Around line 9-13: Update the migration comment to use one canonical import
path: replace the reference to weaver-embedding::pin with the canonical
weaver_spu::core::pin mentioned on line 13 and remove the
duplicate/contradictory phrasing around embedder_pin; keep the note that
embedder_pin moved per the Spec and that cohort-pin is an embedding-substrate
concern, but reference only weaver_spu::core::pin (mentioning embedder_pin,
weaver-embedding::pin, and weaver_spu::core::pin to locate the text).

In `@crates/weaver-spu/src/encoder/gguf_backend.rs`:
- Around line 237-247: The block comment describing
`weaver_inference`/`weaver_embedding` dependency flow is stale now that the code
imports directly from weaver_spu (see the `use std::sync::Mutex;` and `use
weaver_spu::decoder::gguf::{GgufModelParams, init_backend, load_model};` lines);
remove or replace the paragraph with a brief, accurate note explaining current
dependency/usage (or delete it entirely) so it no longer references removed
crates (`weaver_inference`/`weaver_embedding`) and reflects that
`GgufModelParams`, `init_backend`, and `load_model` are imported from
`weaver_spu::decoder::gguf`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 97ed7667-a93a-4ddf-88af-3c1eb5ae6466

📥 Commits

Reviewing files that changed from the base of the PR and between 78e382b and 92477c1.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • Cargo.toml
  • crates/weaver-core/src/embedder.rs
  • crates/weaver-database/src/chunking/mod.rs
  • crates/weaver-database/src/lib.rs
  • crates/weaver-demo/Cargo.toml
  • crates/weaver-demo/src/bin/embedder_latency.rs
  • crates/weaver-demo/src/herobench/belief.rs
  • crates/weaver-demo/src/herobench/benchmark.rs
  • crates/weaver-demo/tests/herobench_integration.rs
  • crates/weaver-demo/tests/similarity_calibration.rs
  • crates/weaver-embedding/Cargo.toml
  • crates/weaver-embedding/src/lib.rs
  • crates/weaver-inference/Cargo.toml
  • crates/weaver-inference/examples/inference.toml
  • crates/weaver-inference/kernels/transformer.cu
  • crates/weaver-inference/src/lib.rs
  • crates/weaver-interface/Cargo.toml
  • crates/weaver-interface/src/harness.rs
  • crates/weaver-interface/src/lib.rs
  • crates/weaver-interface/src/model.rs
  • crates/weaver-interface/src/serve.rs
  • crates/weaver-interface/src/toml_edit.rs
  • crates/weaver-spu/src/decoder/server/prompt.rs
  • crates/weaver-spu/src/encoder/gguf_backend.rs
💤 Files with no reviewable changes (6)
  • crates/weaver-embedding/Cargo.toml
  • crates/weaver-inference/Cargo.toml
  • crates/weaver-embedding/src/lib.rs
  • crates/weaver-inference/examples/inference.toml
  • crates/weaver-inference/kernels/transformer.cu
  • crates/weaver-inference/src/lib.rs

Comment thread crates/weaver-core/src/embedder.rs
…g doc refs

Three CR findings on PR #278 commit 1d36b3c, all valid (doc-only):

1. weaver-core/src/embedder.rs (lines 10, 42, 110, 133, 138):
   five mentions of `weaver-embedding::grpc_client` /
   `weaver-embedding::grpc_client::EmbeddingClient` /
   "gRPC client in weaver-embedding" rewritten to point at
   `weaver_spu::encoder::grpc_client_legacy` (the post-PR-0.5.D
   home). The "tonic dep removed when last consumer goes away"
   note now correctly names PR-3.A.

2. weaver-interface/src/lib.rs (lines 9-13): the embedder_pin
   migration comment had two stale destinations
   (weaver-embedding::pin and weaver_spu::core::pin). Reduced to
   the canonical weaver_spu::core::pin path with a one-line
   reference to PR-0.5.D as the relocation event.

3. weaver-spu/src/encoder/gguf_backend.rs tests block comment:
   the paragraph described the test cross-crate dev-dep wiring
   to weaver_inference. Now that both halves live in
   weaver_spu, the import is a sibling-module use; comment
   replaced with a brief accurate note pointing at
   weaver_spu::decoder::gguf.

Validated: cargo check --workspace clean.
@toddwbucy toddwbucy merged commit 79bb649 into main May 5, 2026
1 check passed
@toddwbucy toddwbucy deleted the refactor/spu-cleanup-old-crates branch May 5, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants