feat: retire Python embedder + gRPC backend (PR-1.J)#300
Conversation
Phase 1 cleanup. The in-process candle `EmbedderClient` (PR #294) plus the daemon backend selection (PR #296) plus the herobench trait refactor (PR #298) plus the candle peer-access fix (#299) together made the Python `weaver-embedder.service` and its gRPC client redundant. This PR removes them entirely. ## Removed - `services/embedder/` — Python service tree (3.1 GB on disk including the .venv with PyTorch/transformers). - `services/embedder/weaver-embedder.service` — systemd unit. - `crates/weaver-spu/src/encoder/grpc_client_legacy.rs` (~600 LOC) and the matching test in `crates/weaver-spu/tests/`. - `proto/persephone/{common,embedding,extraction,training}/*.proto` — Persephone proto definitions, no production callers. - `crates/weaver-database/{build.rs,src/proto.rs,tests/proto_types.rs}` — proto codegen + the only test that referenced the proto types (proto-validation, no production caller). - `crates/weaver-demo/src/bin/embedder_latency.rs` — gRPC-stack latency benchmark binary. - `crates/weaver-demo/tests/similarity_calibration.rs` — gRPC similarity calibration test. - Tonic stack from `weaver-spu/Cargo.toml` (`tonic`, `prost`, `prost-types`, `hyper-util`, `tower`, `tonic-build`). - Tonic stack from `weaver-database/Cargo.toml` (same deps + the `tonic-build` build-dep). ## Simplified - `weaver-spu/build.rs` — drops the Persephone proto compile step; CUDA-kernel compile path unchanged. - `weaver-spu/src/encoder/mod.rs` — drops `grpc_client_legacy` module reference. - `weaver-spu/src/lib.rs` — drops the `proto` module. - `weaver-spu/src/decoder/multi_model.rs` — drops `EmbedderBackend` enum (only one backend now), drops `backend` field from `EmbedderConfig`. Validate() requires `[embedder].snapshot` unconditionally; the old Python-only checks (locked-GPU, non-empty socket, batch_size > 0, model_name non-empty) retired alongside the rules they enforced. Tests updated. - `weaver-core/src/embedder.rs` — drops `From<tonic::transport::Error>` and `From<tonic::Status>` impls (no consumers now). - `weaver-interface/src/serve.rs` — drops `probe_embedder_python`, `EmbedderBackend` match, and all gRPC client imports. Boot path is now: parse config → embedder probe (rust only) → pin verify → build server state → listen. - `weaver-demo/src/herobench/benchmark.rs` — `try_construct_embedder` collapses to "in-process or None"; the gRPC fallback helper `try_construct_embedder_grpc` is gone. - `weaver-demo/tests/herobench_integration.rs` — `try_connect_embedder` rewritten to construct an in-process `EmbedderClient`. The three `integration_dedup_*` tests + their helpers cfg-gated under `feature = "embedder-rust"` (they require a Jina V4 snapshot). - `services/weaver-infer.service` — drops `Requires=`/`BindsTo=`/ `After=weaver-embedder.service`; embedder is in-process now. - `services/weaver.target` — drops `Wants=weaver-embedder.service` and `After=`. - `scripts/install.sh` — drops the `weaver-embedder.service` install step; explicitly removes a stale unit if a previous install left it behind. Updates the operator-facing "next steps" message to reflect the two-unit chain. ## EmbedderConfig back-compat The `socket`, `model_name`, `batch_size`, `use_fp16`, `idle_timeout_seconds`, and `gpu` fields stay on `EmbedderConfig` (deprecated but preserved) so existing `server.toml` files still parse. They're no longer load-bearing — `snapshot` is the only field the daemon actually reads. ## Validation - `cargo check --workspace` — clean - `cargo check -p weaver-interface --features inference,embedder-rust` — clean - `cargo test -p weaver-spu --lib -- test_validate_embedder` — 5 passed (3 new + 2 sibling) The 14 pre-existing `weaver-core::engine::runtime::tests::spawn_*` failures are unchanged from main; they require a live inference server with a "test-model" loaded and aren't introduced by this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR removes the legacy external Python embedder service and Persephone gRPC infrastructure, consolidating all embedding operations onto an in-process Rust-backed client. The change eliminates tonic/protobuf dependencies across multiple crates, deletes the services/embedder directory entirely, removes the gRPC client implementation, and rewires all consumers to use snapshot-based embedder construction. ChangesRetire Legacy gRPC Embedder, Migrate to In-Process Rust Backend
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/weaver-interface/src/serve.rs (2)
77-82:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale error message references retired service.
The error message at line 80-81 tells operators to "Bring weaver-embedder.service up first" but this PR retires that service. Since the embedder is now in-process, this guidance is no longer applicable.
Suggested fix
Ok(_) => bail!( - "Refuse to start: embedder probe failed and a pin exists at {}. \ - The cohort identity cannot be verified without the embedder. \ - Bring weaver-embedder.service up first, then retry.", + "Refuse to start: embedder probe failed and a pin exists at {}. \ + The cohort identity cannot be verified without the embedder. \ + Check that [embedder].snapshot points to a valid Jina V4 HF \ + snapshot directory and that the GPU is available.", pin_path.display() ),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/weaver-interface/src/serve.rs` around lines 77 - 82, The error message in the bail! call in serve.rs still tells operators to "Bring weaver-embedder.service up first" even though the embedder is now in-process; update the bail! message (the bail!(...) invocation) to reference the new reality—e.g., instruct to restart or start the current Weaver process with the embedder enabled or remove the pin—so operators know to restart the main Weaver service (or clear the pin) rather than trying to start a retired weaver-embedder.service. Ensure you edit the string inside the bail! invocation in serve.rs accordingly.
158-167:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale error message suggests retired Python backend.
The error message at lines 162-166 suggests using
backend = \"python\"as an alternative, but the Python embedder backend was retired in this PR. The message should be updated to reflect that theembedder-rustfeature is now required.Suggested fix
#[cfg(not(feature = "embedder-rust"))] { let _ = embedder_config; bail!( - "[embedder].backend = \"rust\" requires the daemon to be built with \ + "[embedder] requires the daemon to be built with \ the `embedder-rust` feature; rebuild with \ - `cargo build -p weaver-interface --features inference,embedder-rust` \ - (or omit `embedder-rust` and use `backend = \"python\"`)" + `cargo build -p weaver-interface --features inference,embedder-rust`" ); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/weaver-interface/src/serve.rs` around lines 158 - 167, Update the stale bail! error message in serve.rs (the block gated by #[cfg(not(feature = "embedder-rust"))] that references embedder_config) to remove the suggestion to use backend = "python" and clearly state that the daemon must be built with the embedder-rust feature; keep the rebuild command (cargo build -p weaver-interface --features inference,embedder-rust) and ensure the message only instructs to rebuild with embedder-rust rather than offering the retired Python backend as an alternative.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/weaver-demo/tests/herobench_integration.rs`:
- Around line 2297-2337: The helper try_connect_embedder currently swallows all
init errors and returns None; change it to return
Result<Option<weaver_spu::encoder::client::EmbedderClient>, anyhow::Error> so
that it only returns Ok(None) when WEAVER_SPU_JINA_V4_SNAPSHOT is unset
(preserve the early-return skip), but on any other failure (invalid
WEAVER_SPU_CUDA_DEVICE, CUDA init, EmbedderClient::from_snapshot, task panic)
return Err(...) with the underlying error instead of printing and returning
None; update callers to call try_connect_embedder().await and assert/expect on
Err to fail the test, only early-return on Ok(None). Ensure references to
WEAVER_SPU_JINA_V4_SNAPSHOT, WEAVER_SPU_CUDA_DEVICE,
candle_core::Device::new_cuda, and EmbedderClient::from_snapshot are used to
locate the code to change.
---
Outside diff comments:
In `@crates/weaver-interface/src/serve.rs`:
- Around line 77-82: The error message in the bail! call in serve.rs still tells
operators to "Bring weaver-embedder.service up first" even though the embedder
is now in-process; update the bail! message (the bail!(...) invocation) to
reference the new reality—e.g., instruct to restart or start the current Weaver
process with the embedder enabled or remove the pin—so operators know to restart
the main Weaver service (or clear the pin) rather than trying to start a retired
weaver-embedder.service. Ensure you edit the string inside the bail! invocation
in serve.rs accordingly.
- Around line 158-167: Update the stale bail! error message in serve.rs (the
block gated by #[cfg(not(feature = "embedder-rust"))] that references
embedder_config) to remove the suggestion to use backend = "python" and clearly
state that the daemon must be built with the embedder-rust feature; keep the
rebuild command (cargo build -p weaver-interface --features
inference,embedder-rust) and ensure the message only instructs to rebuild with
embedder-rust rather than offering the retired Python backend as an alternative.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 631daef0-e441-4fa0-b262-e4e2d96cead0
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (46)
crates/weaver-core/src/embedder.rscrates/weaver-database/Cargo.tomlcrates/weaver-database/build.rscrates/weaver-database/src/lib.rscrates/weaver-database/src/proto.rscrates/weaver-database/tests/proto_types.rscrates/weaver-demo/src/bin/embedder_latency.rscrates/weaver-demo/src/herobench/benchmark.rscrates/weaver-demo/tests/herobench_integration.rscrates/weaver-demo/tests/similarity_calibration.rscrates/weaver-interface/src/serve.rscrates/weaver-spu/Cargo.tomlcrates/weaver-spu/build.rscrates/weaver-spu/src/decoder/multi_model.rscrates/weaver-spu/src/encoder/grpc_client_legacy.rscrates/weaver-spu/src/encoder/mod.rscrates/weaver-spu/src/lib.rscrates/weaver-spu/tests/grpc_client_legacy.rsproto/persephone/common/common.protoproto/persephone/embedding/embedding.protoproto/persephone/extraction/extraction.protoproto/persephone/training/training.protoscripts/install.shservices/embedder/build_proto.shservices/embedder/core/__init__.pyservices/embedder/core/cli/__init__.pyservices/embedder/core/cli/config.pyservices/embedder/core/config/weaver.yamlservices/embedder/core/embedders/__init__.pyservices/embedder/core/embedders/embedders_base.pyservices/embedder/core/embedders/embedders_jina.pyservices/embedder/core/proto/__init__.pyservices/embedder/core/proto/persephone/__init__.pyservices/embedder/core/proto/persephone/embedding/__init__.pyservices/embedder/core/proto/persephone/embedding/embedding_pb2.pyservices/embedder/core/proto/persephone/embedding/embedding_pb2_grpc.pyservices/embedder/core/services/__init__.pyservices/embedder/core/services/embedder_grpc.pyservices/embedder/core/services/embedder_service.pyservices/embedder/requirements.txtservices/embedder/tests/__init__.pyservices/embedder/tests/conftest.pyservices/embedder/tests/test_embedder_grpc.pyservices/embedder/weaver-embedder.serviceservices/weaver-infer.serviceservices/weaver.target
💤 Files with no reviewable changes (29)
- services/embedder/core/config/weaver.yaml
- crates/weaver-database/src/proto.rs
- services/embedder/core/init.py
- services/embedder/requirements.txt
- services/embedder/weaver-embedder.service
- proto/persephone/training/training.proto
- services/embedder/core/proto/persephone/embedding/embedding_pb2_grpc.py
- services/embedder/tests/conftest.py
- crates/weaver-database/build.rs
- crates/weaver-database/src/lib.rs
- crates/weaver-spu/tests/grpc_client_legacy.rs
- services/embedder/core/services/embedder_service.py
- proto/persephone/common/common.proto
- crates/weaver-spu/src/lib.rs
- services/embedder/core/services/embedder_grpc.py
- proto/persephone/extraction/extraction.proto
- services/embedder/core/embedders/embedders_base.py
- services/embedder/build_proto.sh
- services/embedder/tests/test_embedder_grpc.py
- crates/weaver-demo/src/bin/embedder_latency.rs
- crates/weaver-demo/tests/similarity_calibration.rs
- crates/weaver-core/src/embedder.rs
- crates/weaver-database/Cargo.toml
- services/embedder/core/cli/config.py
- services/embedder/core/embedders/embedders_jina.py
- proto/persephone/embedding/embedding.proto
- crates/weaver-database/tests/proto_types.rs
- services/embedder/core/proto/persephone/embedding/embedding_pb2.py
- crates/weaver-spu/src/encoder/grpc_client_legacy.rs
…300 review) CR round 1 on PR #300 — three valid stale-text findings, all fixed. ## 1. try_connect_embedder swallowed errors (silent test skip) Returned `Option<EmbedderClient>` and printed-and-returned-None on any failure (env parse, CUDA init, snapshot construct, task panic). Net effect: a broken snapshot or busted GPU would make the integration tests pass quietly with their work skipped. Bad signal hygiene. Now returns `Result<Option<EmbedderClient>>`: - `Ok(None)` only when `WEAVER_SPU_JINA_V4_SNAPSHOT` is unset — the legitimate "no test infrastructure available" skip. - `Err(...)` on every other failure path (env parse error, CUDA init failure, `EmbedderClient::from_snapshot` failure, task panic). Callers `.expect()` on the Result so the test fails loudly instead of skipping silently. The three `integration_dedup_*` callers were updated; if the env var is set but anything downstream is busted, the test now panics with the specific underlying error. ## 2. Stale "Bring weaver-embedder.service up" message in serve.rs The pin-mismatch refusal in `handle_serve` told operators to start a service that no longer exists. Updated to point at the actual recovery actions: check snapshot path, GPU availability, daemon build features. Added the canonical pin-clear command for the case where the pin is intentionally stale (corpus re-embed planned). ## 3. Stale "use backend = python" message in probe_embedder_rust The `cfg(not(feature = "embedder-rust"))` bail offered the Python backend as a fallback. Python is gone post-PR-1.J. Updated to just say "rebuild with the feature enabled" — no false alternative. Validates: `cargo check -p weaver-interface --features inference,embedder-rust` and `cargo check -p weaver-demo --features embedder-rust --tests` both clean (no new warnings). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/weaver-interface/src/serve.rs (1)
53-56:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale comment references retired Python service.
Line 55 mentions "a wedged Python service" but this PR retires the Python embedder entirely. The comment should reflect that the only failure modes are now snapshot/CUDA/Rust embedder issues.
Suggested fix
// Embedder probe + pin verify runs BEFORE model loading so the daemon // fails fast on a missing snapshot, busted CUDA init, or a wedged - // Python service rather than burning seconds on `build_server_state()` + // embedder rather than burning seconds on `build_server_state()` // first. (PR `#296` review.)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/weaver-interface/src/serve.rs` around lines 53 - 56, Update the stale comment that begins "Embedder probe + pin verify..." to remove the reference to "a wedged Python service" and instead state that failures can come from missing snapshot, busted CUDA init, or the Rust embedder; keep the rest of the sentence and the reference to build_server_state() intact so readers know why the probe runs before build_server_state().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/weaver-interface/src/serve.rs`:
- Around line 53-56: Update the stale comment that begins "Embedder probe + pin
verify..." to remove the reference to "a wedged Python service" and instead
state that failures can come from missing snapshot, busted CUDA init, or the
Rust embedder; keep the rest of the sentence and the reference to
build_server_state() intact so readers know why the probe runs before
build_server_state().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 76a07135-b8bc-4544-bede-330d70491f2e
📒 Files selected for processing (2)
crates/weaver-demo/tests/herobench_integration.rscrates/weaver-interface/src/serve.rs
…#300 review) The comment justifying why the embedder probe runs before `build_server_state()` still listed "a wedged Python service" as a failure mode the early probe protects against. Python service retired in PR-1.J; the failure modes are now: missing snapshot, busted CUDA init, or Rust EmbedderClient construction failure. Comment text updated, structure preserved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/weaver-interface/src/serve.rs (1)
178-190:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winError messages reference non-existent
[embedder].backendconfig field.The
EmbedderConfigstruct inweaver-spu/src/decoder/multi_model.rs(lines 120–168) does not have abackendfield. However, these error messages reference[embedder].backend = "rust", which will confuse operators looking for a configuration option that doesn't exist.In the
embedder-rustfeature context, Rust is the only embedder backend. The error messages should be updated to reflect this and remove the reference to the nonexistent field:Proposed fix
let embedder_config = embedder_config.ok_or_else(|| { anyhow::anyhow!( - "[embedder].backend = \"rust\" but no [embedder] block in server.toml; \ + "No [embedder] block in server.toml; \ add the block with `snapshot = \"/path/to/jina-v4/snapshot/<sha>/\"`" ) })?; let snapshot = embedder_config.snapshot.as_ref().ok_or_else(|| { anyhow::anyhow!( - "[embedder].backend = \"rust\" requires [embedder].snapshot to point at \ + "[embedder].snapshot is required and must point at \ the Jina V4 HF snapshot directory (e.g. \ `snapshot = \"/opt/weaver/huggingface/.../snapshots/<sha>/\"`)" ) })?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/weaver-interface/src/serve.rs` around lines 178 - 190, Update the error messages that currently reference a non-existent `[embedder].backend` field: change the first anyhow!() (constructed for embedder_config) to refer only to a missing `[embedder]` block and instruct operators to add `snapshot = "/path/to/jina-v4/snapshot/<sha>/"`; change the second anyhow!() (for snapshot) to drop the `backend = "rust"` mention and state that `[embedder].snapshot` must point to the Jina V4 HF snapshot directory (e.g. `snapshot = "/opt/weaver/huggingface/.../snapshots/<sha>/"`); modify the messages where embedder_config and snapshot are created to remove any reference to `[embedder].backend`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/weaver-interface/src/serve.rs`:
- Around line 178-190: Update the error messages that currently reference a
non-existent `[embedder].backend` field: change the first anyhow!() (constructed
for embedder_config) to refer only to a missing `[embedder]` block and instruct
operators to add `snapshot = "/path/to/jina-v4/snapshot/<sha>/"`; change the
second anyhow!() (for snapshot) to drop the `backend = "rust"` mention and state
that `[embedder].snapshot` must point to the Jina V4 HF snapshot directory (e.g.
`snapshot = "/opt/weaver/huggingface/.../snapshots/<sha>/"`); modify the
messages where embedder_config and snapshot are created to remove any reference
to `[embedder].backend`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: deff92d2-0187-4212-9ddf-f8d5a62ff6b3
📒 Files selected for processing (1)
crates/weaver-interface/src/serve.rs
…essages (PR #300 review) Two anyhow! sites in `probe_embedder_rust` still referenced the `[embedder].backend = "rust"` field that retired in PR-1.J along with the `EmbedderBackend` enum. Updated to point at the actual configuration the operator needs: presence of the `[embedder]` block + a snapshot path. Same recovery instructions, no false references to removed fields. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 1 cleanup. Now that the in-process candle `EmbedderClient` (PR #294), daemon backend selection (PR #296), herobench trait refactor (PR #298), and candle peer-access fix (#299) are all in main, the Python `weaver-embedder.service` and its gRPC client are redundant infrastructure. This PR removes them entirely.
Summary
Simplified
EmbedderConfig back-compat
The `socket`, `model_name`, `batch_size`, `use_fp16`, `idle_timeout_seconds`, and `gpu` fields stay on `EmbedderConfig` (deprecated but preserved) so existing `server.toml` files still parse without erroring. They're no longer load-bearing — only `snapshot` is read.
Test plan
Known pre-existing failures (not introduced by this PR)
The 14 `weaver-core::engine::runtime::tests::spawn_*` failures (`ModelLoadFailed("Model 'test-model' is not loaded...")`) are unchanged from main. They require a live inference server with a `test-model` loaded — infra not provided by the test harness.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Removed Features