feat(spu): backend selection in serve.rs (PR-1.G)#296
Conversation
Phase 1 PR-1.G — wire the daemon to pick between the Python embedder service (legacy, default) and the in-process Rust EmbedderClient (post-Phase-1) at boot time, branching on `[embedder].backend` in `server.toml`. Default stays "python" so existing deployments keep their current path until the operator explicitly flips. Stacked on top of PR-1.F (#294 — in-process EmbedderClient). Won't push until that merges to main; rebased then. ## What changes ### multi_model.rs - New `EmbedderBackend` enum: `Python` (default) | `Rust`. Serde `rename_all = "lowercase"` so `server.toml` reads `backend = "python"` / `backend = "rust"`. - New `EmbedderConfig.backend` field (default = `Python`). - New `EmbedderConfig.snapshot: Option<PathBuf>` field — required when `backend = "rust"`, ignored under `python`. Required-but-missing under `rust` is a hard boot failure with a clear error pointing at the config block. ### serve.rs - `verify_embedder_pin` now takes `&weaver_core::embedder::EmbedderInfo` (the backend-agnostic trait type) instead of the proto `InfoResponse`. The Python-path probe converts proto → trait type at the boundary so the verifier is shape-independent. - `probe_embedder` renamed to `probe_embedder_python`. - New `probe_embedder_rust(embedder_config) -> Result<Option<EmbedderInfo>>`: constructs `EmbedderClient::from_snapshot(...)` on a blocking thread (load is sync + GPU-bound, would starve the runtime inline), reads its `info()`, returns. No retry loop — in-process construction either works or doesn't, no transient-recovery semantics. - Boot path branches on `config.embedder.backend`. When the daemon isn't built with the `embedder-rust` cargo feature, selecting `backend = "rust"` is a hard fail rather than a silent fallback to Python. ### Cargo.toml (weaver-interface) - New `embedder-rust` feature. Pulls in `weaver-spu/flash-attn` (transitively enables candle + candle-cuda) and `dep:candle-core` (needed in serve.rs to construct `Device::new_cuda(N)` and `DType`). - Default builds keep the Python path; flip via `--features embedder-rust` for the Rust backend. - The `inference` feature is unchanged and remains independent — combine `--features inference,embedder-rust` for the full daemon. ### build.rs (weaver-spu) — pre-existing main-branch bug fix `crates/weaver-spu/build.rs` references `kernels/transformer.cu` for the cudarc-backed legacy decoder path, but PR-0.5.E (#278) deleted that file when it consolidated `weaver-inference` into `weaver-spu`. Result: any build with `--features cuda` (and transitively `--features inference`) fails on main with `fatal error: kernels/transformer.cu: No such file or directory`. This blocked validation of PR-1.G locally — `cargo check -p weaver-interface --features inference,embedder-rust` couldn't get past the build step. Hardened the build.rs to skip the cuda compile when the kernel file is absent (with a `cargo:warning` so the operator sees what's happening). Out-of-scope to actually restore the kernel here; that's its own follow-up. For now, the cudarc decoder path simply doesn't link — which is fine for default builds and for the embedder-rust path that doesn't touch decoder kernels. ## Test plan - [x] `cargo check -p weaver-spu` (default features) — clean - [x] `cargo check -p weaver-interface` (no features) — clean - [x] `cargo check -p weaver-interface --features inference,embedder-rust` — clean (with the build.rs hardening fix) - [x] `cargo check -p weaver-interface --features embedder-rust` — clean - [ ] CodeRabbit review clean - [ ] End-to-end smoke (deferred to follow-up): bring up daemon with `[embedder].backend = "rust"` and a real Jina V4 snapshot path; confirm cohort-pin probe writes/verifies against a snapshot identity ## Known issues outside this PR's scope 1. `cargo clippy -p weaver-interface --features inference,embedder-rust -- -D warnings` fails with two pre-existing `explicit_auto_deref` lints in `weaver-spu/src/decoder/gguf.rs:751-752` (legacy GGUF decoder path). Independent of PR-1.G; fix as a separate hygiene PR. 2. The cudarc decoder kernel restore (proper fix for the transformer.cu deletion) is its own follow-up. ## Follow-ups (existing tasks) - #117 Phase 1: smoke-test in-process backend end-to-end - #118 Phase 1: agent-side runtime cutover to in-process embedder 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds an embedder backend selector (python|rust), a feature-flagged Rust in-process embedder path with optional dependency, backend-agnostic embedder probing and pin verification before model load, validation requiring Rust snapshot, conditional CUDA kernel compilation, and related test updates. ChangesMulti-backend Embedder Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
|
@coderabbitai please review |
|
Tip For best results, initiate chat on the files or code changes.
[review] |
|
@coderabbitai please review |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/weaver-interface/src/serve.rs`:
- Around line 111-130: The probe of the embedder backend (the match on backend
and calls to probe_embedder_python and probe_embedder_rust) must run before
build_server_state() and any model-loading logic so failures short-circuit
startup; move the backend detection/probe (including the embedder_socket
handling and the probe_embedder_* calls) out of the post-build_server_state path
and execute it immediately after reading config (or before calling
build_server_state()), returning the probe error early when probe_embedder_rust
returns Err or probe_embedder_python fails so model loading never proceeds on a
known embedder failure.
In `@crates/weaver-spu/src/decoder/multi_model.rs`:
- Around line 143-160: MultiModelConfig::validate() currently doesn't enforce
that snapshot is present when backend is set to the Rust embedder, so add a
validation check in MultiModelConfig::validate() (or in
MultiModelConfig::parse() after deserialization) that returns an error if
self.backend == EmbedderBackend::Rust and self.snapshot.is_none(); reference the
backend field and snapshot Option<std::path::PathBuf> on the MultiModelConfig
struct and ensure the returned error message clearly says "snapshot is required
for backend = rust"; this prevents parse() from succeeding and avoids deferring
failure to probe_embedder_rust().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dedce8d9-504a-4276-886f-68b8ad7222fa
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
crates/weaver-interface/Cargo.tomlcrates/weaver-interface/src/serve.rscrates/weaver-spu/build.rscrates/weaver-spu/src/decoder/multi_model.rs
CR round 2 on PR #296 — two valid findings, both addressed: ## 1. Validate snapshot requirement at config-parse time `MultiModelConfig::validate()` now bails when `backend = "rust"` is set without a `snapshot` path. Previously the config parsed successfully and the failure surfaced inside `probe_embedder_rust()` after `weaver serve` had already begun startup work — the operator-facing error came late and far from the source of the problem. The new check fires at `MultiModelConfig::parse()` → `validate()`, which is what `MultiModelConfig::from_file()` calls before returning. So a malformed `server.toml` errors out before any model loading, device construction, or pin verification gets a chance. ## 2. Move embedder probe ahead of build_server_state() The boot order was: 1. parse + validate config 2. apply CLI overrides 3. build_server_state() ← loads any [[models]] declared in config 4. embedder probe + pin verify 5. server.serve() If the embedder probe fails (snapshot missing, CUDA init, daemon not built with embedder-rust feature, Python service wedged) we'd still pay the model-loading cost in step 3 first. Reordered to: 1. parse + validate config 2. apply CLI overrides 3. embedder probe + pin verify ← moved here 4. build_server_state() 5. server.serve() So a known embedder failure aborts boot before any model touches GPU. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/weaver-spu/src/decoder/multi_model.rs`:
- Around line 598-610: Add unit tests exercising the new backend-specific
validation on MultiModelConfig: implement tests like
test_validate_rust_backend_requires_snapshot (call MultiModelConfig::parse with
an embedder backend = "rust" and no snapshot and assert the error contains
"requires [embedder].snapshot"), test_validate_rust_backend_with_snapshot_ok
(parse a toml with backend = "rust" plus snapshot and a [[gpus]] entry with
policy = "locked", assert parse succeeds and embedder.snapshot.is_some() and
emb.backend == EmbedderBackend::Rust), and
test_validate_python_backend_without_snapshot_ok (parse backend = "python"
without snapshot, assert success and emb.backend == EmbedderBackend::Python and
snapshot.is_none()); place these in the same test module exercising
MultiModelConfig::parse so future changes to the validation in multi_model.rs
are covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c96cbdda-2402-4976-87f8-4161375c65b2
📒 Files selected for processing (2)
crates/weaver-interface/src/serve.rscrates/weaver-spu/src/decoder/multi_model.rs
CR round 3 — finding was valid: the new EmbedderConfig.snapshot validation introduced in PR #296 had no direct test coverage. Three tests added in the existing multi_model::tests module covering: - test_validate_rust_backend_requires_snapshot — backend = "rust" without `snapshot` errors at parse, message contains "requires [embedder].snapshot" - test_validate_rust_backend_with_snapshot_ok — backend = "rust" with `snapshot` parses, EmbedderBackend::Rust + snapshot.is_some() - test_validate_python_backend_without_snapshot_ok — default backend (Python) parses, snapshot.is_none() — confirms the new check doesn't reject existing Python configs All three pass: `cargo test -p weaver-spu --lib -- \ test_validate_rust_backend test_validate_python_backend`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/weaver-spu/src/decoder/multi_model.rs`:
- Around line 599-610: The config validation currently checks only for
emb.snapshot being None when emb.backend == EmbedderBackend::Rust, but an empty
string becomes Some(PathBuf::new()) and bypasses the check; update the
validation in multi_model.rs where emb.backend and emb.snapshot are inspected to
also treat empty/zero-length paths as invalid (e.g., check
PathBuf::as_os_str().is_empty() or Path::to_str().map(|s| s.is_empty()) and bail
with the same error message when snapshot is empty), ensuring the same
descriptive anyhow::bail(...) runs for both None and empty snapshot cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e8fc5eb1-3a1c-4a9d-af1e-2ea0fb6d4ee8
📒 Files selected for processing (1)
crates/weaver-spu/src/decoder/multi_model.rs
CR caught a hole in the snapshot validation: `snapshot = ""` deserializes to `Some(PathBuf::new())`, which slips past a bare `is_none()` check. The `match` now treats both `None` and `Some(empty path)` as "snapshot missing" and bails with the same descriptive error. Added `test_validate_rust_backend_rejects_empty_snapshot` to lock in the empty-string case alongside the existing None-case test. All four backend-validation tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
What changes
`crates/weaver-spu/src/decoder/multi_model.rs`
`crates/weaver-interface/src/serve.rs`
`crates/weaver-interface/Cargo.toml`
`crates/weaver-spu/build.rs` — pre-existing main-branch bug fix
`crates/weaver-spu/build.rs` references `kernels/transformer.cu` for the cudarc-backed legacy decoder path, but PR-0.5.E (#278) deleted that file when it consolidated `weaver-inference` into `weaver-spu`. Result: any build with `--features cuda` (and transitively `--features inference`) failed on main with `fatal error: kernels/transformer.cu: No such file or directory`.
This blocked validation of PR-1.G locally — `cargo check -p weaver-interface --features inference,embedder-rust` couldn't get past the build step. Hardened the build.rs to skip the cuda compile when the kernel file is absent (with a `cargo:warning` so the operator sees what's happening). Out-of-scope to actually restore the kernel here; that's its own follow-up. For now, the cudarc decoder path simply doesn't link — fine for default builds and for the embedder-rust path that doesn't touch decoder kernels.
Test plan
Known issues outside this PR's scope
Follow-ups (existing tasks)
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests