Skip to content

feat(spu): backend selection in serve.rs (PR-1.G)#296

Merged
toddwbucy merged 4 commits into
mainfrom
feat/spu-backend-selection-serve
May 8, 2026
Merged

feat(spu): backend selection in serve.rs (PR-1.G)#296
toddwbucy merged 4 commits into
mainfrom
feat/spu-backend-selection-serve

Conversation

@toddwbucy
Copy link
Copy Markdown
Owner

@toddwbucy toddwbucy commented May 8, 2026

Summary

  • Phase 1 PR-1.G — wire the daemon to pick between the Python embedder service (legacy, default) and the in-process Rust `EmbedderClient` (post-PR feat(spu): in-process EmbedderClient backend (PR-1.F) #294) at boot time, branching on `[embedder].backend` in `server.toml`. Default stays `"python"` so existing deployments are untouched until the operator explicitly flips.
  • Also includes a small `build.rs` hardening fix for a pre-existing main-branch regression that blocked any `--features inference` build.

What changes

`crates/weaver-spu/src/decoder/multi_model.rs`

  • New `EmbedderBackend` enum: `Python` (default) | `Rust`. Serde `rename_all = "lowercase"` so `server.toml` reads `backend = "python"` / `backend = "rust"`.
  • New `EmbedderConfig.backend` field (default = `Python`).
  • New `EmbedderConfig.snapshot: Option` field — required when `backend = "rust"`, ignored under `python`. Required-but-missing under `rust` is a hard boot failure with an error pointing at the config block.

`crates/weaver-interface/src/serve.rs`

  • `verify_embedder_pin` now takes `&weaver_core::embedder::EmbedderInfo` (the backend-agnostic trait type) instead of the proto `InfoResponse`. The Python-path probe converts proto → trait type at the boundary so the verifier is shape-independent.
  • `probe_embedder` renamed to `probe_embedder_python`.
  • New `probe_embedder_rust(embedder_config) -> Result<Option>`: constructs `EmbedderClient::from_snapshot(...)` on a blocking thread (load is sync + GPU-bound, would starve the runtime inline), reads its `info()`, returns. No retry loop — in-process construction either works or doesn't, no transient-recovery semantics.
  • Boot path branches on `config.embedder.backend`. When the daemon isn't built with the `embedder-rust` cargo feature, selecting `backend = "rust"` is a hard fail rather than a silent fallback to Python.

`crates/weaver-interface/Cargo.toml`

  • New `embedder-rust` feature: `["weaver-spu/flash-attn", "dep:candle-core"]`. Pulls in candle + candle-cuda transitively (needed to construct `Device::new_cuda(N)` and `DType` in serve.rs).
  • Default builds keep the Python path; flip via `--features embedder-rust` for the Rust backend. `inference` and `embedder-rust` are independent and can combine.

`crates/weaver-spu/build.rs` — pre-existing main-branch bug fix

`crates/weaver-spu/build.rs` references `kernels/transformer.cu` for the cudarc-backed legacy decoder path, but PR-0.5.E (#278) deleted that file when it consolidated `weaver-inference` into `weaver-spu`. Result: any build with `--features cuda` (and transitively `--features inference`) failed on main with `fatal error: kernels/transformer.cu: No such file or directory`.

This blocked validation of PR-1.G locally — `cargo check -p weaver-interface --features inference,embedder-rust` couldn't get past the build step. Hardened the build.rs to skip the cuda compile when the kernel file is absent (with a `cargo:warning` so the operator sees what's happening). Out-of-scope to actually restore the kernel here; that's its own follow-up. For now, the cudarc decoder path simply doesn't link — fine for default builds and for the embedder-rust path that doesn't touch decoder kernels.

Test plan

  • `cargo check -p weaver-spu` (default features) — clean
  • `cargo check -p weaver-interface` (no features) — clean
  • `cargo check -p weaver-interface --features inference,embedder-rust` — clean (with the build.rs hardening fix)
  • `cargo check -p weaver-interface --features embedder-rust` — clean
  • CodeRabbit review clean
  • End-to-end smoke (deferred to follow-up feat(inference): Jina V4 vision path via llama.cpp mtmd #117): bring up daemon with `[embedder].backend = "rust"` and a real Jina V4 snapshot path; confirm cohort-pin probe writes/verifies against a snapshot identity

Known issues outside this PR's scope

  1. `cargo clippy -p weaver-interface --features inference,embedder-rust -- -D warnings` fails with two pre-existing `explicit_auto_deref` lints in `weaver-spu/src/decoder/gguf.rs:751-752` (legacy GGUF decoder path). Independent of PR-1.G; fix as a separate hygiene PR.
  2. The cudarc decoder kernel restore (proper fix for the transformer.cu deletion) is its own follow-up.

Follow-ups (existing tasks)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Optional Rust in-process embedder can be enabled as an alternative to the default Python embedder.
  • Bug Fixes

    • Startup now probes the selected embedder and verifies embedder pin before loading state/models, with clearer handling of missing pin vs other failures.
    • Configuration enforces that a required snapshot is provided when the Rust embedder is selected.
  • Chores

    • CUDA kernel compilation runs only when kernel sources are present.
  • Tests

    • Tests updated for backend-agnostic embedder behavior.

Phase 1 PR-1.G — wire the daemon to pick between the Python embedder
service (legacy, default) and the in-process Rust EmbedderClient
(post-Phase-1) at boot time, branching on `[embedder].backend` in
`server.toml`. Default stays "python" so existing deployments keep
their current path until the operator explicitly flips.

Stacked on top of PR-1.F (#294 — in-process EmbedderClient). Won't
push until that merges to main; rebased then.

## What changes

### multi_model.rs

- New `EmbedderBackend` enum: `Python` (default) | `Rust`. Serde
  `rename_all = "lowercase"` so `server.toml` reads
  `backend = "python"` / `backend = "rust"`.
- New `EmbedderConfig.backend` field (default = `Python`).
- New `EmbedderConfig.snapshot: Option<PathBuf>` field — required when
  `backend = "rust"`, ignored under `python`. Required-but-missing
  under `rust` is a hard boot failure with a clear error pointing at
  the config block.

### serve.rs

- `verify_embedder_pin` now takes `&weaver_core::embedder::EmbedderInfo`
  (the backend-agnostic trait type) instead of the proto
  `InfoResponse`. The Python-path probe converts proto → trait type at
  the boundary so the verifier is shape-independent.
- `probe_embedder` renamed to `probe_embedder_python`.
- New `probe_embedder_rust(embedder_config) -> Result<Option<EmbedderInfo>>`:
  constructs `EmbedderClient::from_snapshot(...)` on a blocking thread
  (load is sync + GPU-bound, would starve the runtime inline), reads
  its `info()`, returns. No retry loop — in-process construction
  either works or doesn't, no transient-recovery semantics.
- Boot path branches on `config.embedder.backend`. When the daemon
  isn't built with the `embedder-rust` cargo feature, selecting
  `backend = "rust"` is a hard fail rather than a silent fallback to
  Python.

### Cargo.toml (weaver-interface)

- New `embedder-rust` feature. Pulls in `weaver-spu/flash-attn`
  (transitively enables candle + candle-cuda) and `dep:candle-core`
  (needed in serve.rs to construct `Device::new_cuda(N)` and `DType`).
- Default builds keep the Python path; flip via
  `--features embedder-rust` for the Rust backend.
- The `inference` feature is unchanged and remains independent —
  combine `--features inference,embedder-rust` for the full daemon.

### build.rs (weaver-spu) — pre-existing main-branch bug fix

`crates/weaver-spu/build.rs` references `kernels/transformer.cu` for
the cudarc-backed legacy decoder path, but PR-0.5.E (#278) deleted
that file when it consolidated `weaver-inference` into `weaver-spu`.
Result: any build with `--features cuda` (and transitively
`--features inference`) fails on main with
`fatal error: kernels/transformer.cu: No such file or directory`.

This blocked validation of PR-1.G locally — `cargo check
-p weaver-interface --features inference,embedder-rust` couldn't get
past the build step. Hardened the build.rs to skip the cuda compile
when the kernel file is absent (with a `cargo:warning` so the
operator sees what's happening). Out-of-scope to actually restore
the kernel here; that's its own follow-up. For now, the cudarc
decoder path simply doesn't link — which is fine for default builds
and for the embedder-rust path that doesn't touch decoder kernels.

## Test plan

- [x] `cargo check -p weaver-spu` (default features) — clean
- [x] `cargo check -p weaver-interface` (no features) — clean
- [x] `cargo check -p weaver-interface --features inference,embedder-rust` — clean (with the build.rs hardening fix)
- [x] `cargo check -p weaver-interface --features embedder-rust` — clean
- [ ] CodeRabbit review clean
- [ ] End-to-end smoke (deferred to follow-up): bring up daemon with `[embedder].backend = "rust"` and a real Jina V4 snapshot path; confirm cohort-pin probe writes/verifies against a snapshot identity

## Known issues outside this PR's scope

1. `cargo clippy -p weaver-interface --features inference,embedder-rust -- -D warnings` fails with two pre-existing `explicit_auto_deref` lints in `weaver-spu/src/decoder/gguf.rs:751-752` (legacy GGUF decoder path). Independent of PR-1.G; fix as a separate hygiene PR.
2. The cudarc decoder kernel restore (proper fix for the
   transformer.cu deletion) is its own follow-up.

## Follow-ups (existing tasks)

- #117 Phase 1: smoke-test in-process backend end-to-end
- #118 Phase 1: agent-side runtime cutover to in-process embedder

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0a8771d2-030f-44dc-a9c4-c1c6a68a824c

📥 Commits

Reviewing files that changed from the base of the PR and between 1a03ee2 and 65fa59b.

📒 Files selected for processing (1)
  • crates/weaver-spu/src/decoder/multi_model.rs

📝 Walkthrough

Walkthrough

Adds an embedder backend selector (python|rust), a feature-flagged Rust in-process embedder path with optional dependency, backend-agnostic embedder probing and pin verification before model load, validation requiring Rust snapshot, conditional CUDA kernel compilation, and related test updates.

Changes

Multi-backend Embedder Support

Layer / File(s) Summary
Configuration Schema
crates/weaver-spu/src/decoder/multi_model.rs
Adds EmbedderBackend enum (python, rust), EmbedderConfig.backend and snapshot: Option<PathBuf]; validates that snapshot is required when backend = "rust"; unit tests added.
Feature Flags & Dependencies
crates/weaver-interface/Cargo.toml
New embedder-rust Cargo feature enabling weaver-spu/flash-attn and optional workspace candle-core dependency; comments note default Python behavior.
Backend Probing & Dispatch
crates/weaver-interface/src/serve.rs
handle_serve probes configured backend before server state/model load: probe_embedder_python uses socket+retry and maps to EmbedderInfo; probe_embedder_rust (feature-gated) constructs in-process client from snapshot/GPU and returns EmbedderInfo; verify_embedder_pin now accepts &EmbedderInfo.
Build System
crates/weaver-spu/build.rs
When CUDA feature enabled, compile CUDA kernels only if kernels/transformer.cu exists; otherwise emit a warning and add rerun-if-changed; proto compilation remains unconditional.
Tests
crates/weaver-interface/src/serve.rs
Test helpers updated to construct EmbedderInfo directly and exercise updated pin verification behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped through code at break of dawn,

Two embedders now—one near, one yon;
Python listens, Rust wakes in-core,
Kernels check paths, candles wait by the door.
I nibble features, then I bound once more.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding backend selection (Python vs. Rust embedder) controlled via [embedder].backend configuration in serve.rs and supporting infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

@toddwbucy
Copy link
Copy Markdown
Owner Author

@coderabbitai please review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Tip

For best results, initiate chat on the files or code changes.

@toddwbucy: I'll start reviewing the PR right away!

[review]

@toddwbucy
Copy link
Copy Markdown
Owner Author

@coderabbitai please review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

@toddwbucy: I'll review the PR changes now!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-interface/src/serve.rs`:
- Around line 111-130: The probe of the embedder backend (the match on backend
and calls to probe_embedder_python and probe_embedder_rust) must run before
build_server_state() and any model-loading logic so failures short-circuit
startup; move the backend detection/probe (including the embedder_socket
handling and the probe_embedder_* calls) out of the post-build_server_state path
and execute it immediately after reading config (or before calling
build_server_state()), returning the probe error early when probe_embedder_rust
returns Err or probe_embedder_python fails so model loading never proceeds on a
known embedder failure.

In `@crates/weaver-spu/src/decoder/multi_model.rs`:
- Around line 143-160: MultiModelConfig::validate() currently doesn't enforce
that snapshot is present when backend is set to the Rust embedder, so add a
validation check in MultiModelConfig::validate() (or in
MultiModelConfig::parse() after deserialization) that returns an error if
self.backend == EmbedderBackend::Rust and self.snapshot.is_none(); reference the
backend field and snapshot Option<std::path::PathBuf> on the MultiModelConfig
struct and ensure the returned error message clearly says "snapshot is required
for backend = rust"; this prevents parse() from succeeding and avoids deferring
failure to probe_embedder_rust().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dedce8d9-504a-4276-886f-68b8ad7222fa

📥 Commits

Reviewing files that changed from the base of the PR and between d6ca4e8 and 1bd31cd.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • crates/weaver-interface/Cargo.toml
  • crates/weaver-interface/src/serve.rs
  • crates/weaver-spu/build.rs
  • crates/weaver-spu/src/decoder/multi_model.rs

Comment thread crates/weaver-interface/src/serve.rs
Comment thread crates/weaver-spu/src/decoder/multi_model.rs
CR round 2 on PR #296 — two valid findings, both addressed:

## 1. Validate snapshot requirement at config-parse time

`MultiModelConfig::validate()` now bails when
`backend = "rust"` is set without a `snapshot` path. Previously the
config parsed successfully and the failure surfaced inside
`probe_embedder_rust()` after `weaver serve` had already begun
startup work — the operator-facing error came late and far from
the source of the problem.

The new check fires at `MultiModelConfig::parse()` → `validate()`,
which is what `MultiModelConfig::from_file()` calls before returning.
So a malformed `server.toml` errors out before any model loading,
device construction, or pin verification gets a chance.

## 2. Move embedder probe ahead of build_server_state()

The boot order was:

  1. parse + validate config
  2. apply CLI overrides
  3. build_server_state()    ← loads any [[models]] declared in config
  4. embedder probe + pin verify
  5. server.serve()

If the embedder probe fails (snapshot missing, CUDA init, daemon
not built with embedder-rust feature, Python service wedged) we'd
still pay the model-loading cost in step 3 first. Reordered to:

  1. parse + validate config
  2. apply CLI overrides
  3. embedder probe + pin verify   ← moved here
  4. build_server_state()
  5. server.serve()

So a known embedder failure aborts boot before any model touches GPU.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-spu/src/decoder/multi_model.rs`:
- Around line 598-610: Add unit tests exercising the new backend-specific
validation on MultiModelConfig: implement tests like
test_validate_rust_backend_requires_snapshot (call MultiModelConfig::parse with
an embedder backend = "rust" and no snapshot and assert the error contains
"requires [embedder].snapshot"), test_validate_rust_backend_with_snapshot_ok
(parse a toml with backend = "rust" plus snapshot and a [[gpus]] entry with
policy = "locked", assert parse succeeds and embedder.snapshot.is_some() and
emb.backend == EmbedderBackend::Rust), and
test_validate_python_backend_without_snapshot_ok (parse backend = "python"
without snapshot, assert success and emb.backend == EmbedderBackend::Python and
snapshot.is_none()); place these in the same test module exercising
MultiModelConfig::parse so future changes to the validation in multi_model.rs
are covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c96cbdda-2402-4976-87f8-4161375c65b2

📥 Commits

Reviewing files that changed from the base of the PR and between 1bd31cd and 912e28b.

📒 Files selected for processing (2)
  • crates/weaver-interface/src/serve.rs
  • crates/weaver-spu/src/decoder/multi_model.rs

Comment thread crates/weaver-spu/src/decoder/multi_model.rs
CR round 3 — finding was valid: the new
EmbedderConfig.snapshot validation introduced in PR #296 had no
direct test coverage. Three tests added in the existing
multi_model::tests module covering:

  - test_validate_rust_backend_requires_snapshot — backend = "rust"
    without `snapshot` errors at parse, message contains
    "requires [embedder].snapshot"
  - test_validate_rust_backend_with_snapshot_ok — backend = "rust"
    with `snapshot` parses, EmbedderBackend::Rust + snapshot.is_some()
  - test_validate_python_backend_without_snapshot_ok — default
    backend (Python) parses, snapshot.is_none() — confirms the new
    check doesn't reject existing Python configs

All three pass: `cargo test -p weaver-spu --lib -- \
test_validate_rust_backend test_validate_python_backend`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-spu/src/decoder/multi_model.rs`:
- Around line 599-610: The config validation currently checks only for
emb.snapshot being None when emb.backend == EmbedderBackend::Rust, but an empty
string becomes Some(PathBuf::new()) and bypasses the check; update the
validation in multi_model.rs where emb.backend and emb.snapshot are inspected to
also treat empty/zero-length paths as invalid (e.g., check
PathBuf::as_os_str().is_empty() or Path::to_str().map(|s| s.is_empty()) and bail
with the same error message when snapshot is empty), ensuring the same
descriptive anyhow::bail(...) runs for both None and empty snapshot cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e8fc5eb1-3a1c-4a9d-af1e-2ea0fb6d4ee8

📥 Commits

Reviewing files that changed from the base of the PR and between 912e28b and 1a03ee2.

📒 Files selected for processing (1)
  • crates/weaver-spu/src/decoder/multi_model.rs

Comment thread crates/weaver-spu/src/decoder/multi_model.rs
CR caught a hole in the snapshot validation: `snapshot = ""`
deserializes to `Some(PathBuf::new())`, which slips past a bare
`is_none()` check. The `match` now treats both `None` and
`Some(empty path)` as "snapshot missing" and bails with the same
descriptive error.

Added `test_validate_rust_backend_rejects_empty_snapshot` to lock
in the empty-string case alongside the existing None-case test.
All four backend-validation tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@toddwbucy toddwbucy merged commit 6b57422 into main May 8, 2026
1 check passed
@toddwbucy toddwbucy deleted the feat/spu-backend-selection-serve branch May 8, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants