feat(herobench): trait-typed embedder + opt-in in-process backend (PR-1.I)#298
Conversation
…-1.I) Phase 1 task #118 — agent-side runtime cutover for the herobench dedup gate. Two layered changes: ## 1. Trait refactor (no behavior change) `belief::dedup_and_upsert_attempt_hypothesis` previously took the concrete `&weaver_spu::encoder::grpc_client_legacy::EmbeddingClient`. Now takes `&dyn weaver_core::embedder::Embedder`. The benchmark.rs construction site mirrors this — variable type changed to `Option<Arc<dyn Embedder>>`. Existing gRPC `EmbeddingClient` already implements the trait so the swap is type-system-only. This is the load-bearing plumbing change: future swaps of the embedder backend never need to touch the dedup logic again. ## 2. Opt-in in-process backend via env var (feature-gated) New helper `try_construct_embedder()` in benchmark.rs tries paths in this order: 1. **In-process** `EmbedderClient::from_snapshot(...)` if the `embedder-rust` feature is enabled AND `WEAVER_SPU_JINA_V4_SNAPSHOT` is set. GPU ordinal from `WEAVER_SPU_CUDA_DEVICE` (default 0), mirrors the `jina_embed` smoke binary's env contract. Construction runs under `tokio::task::spawn_blocking` since model load is sync + GPU-bound (seconds-long). 2. **Legacy gRPC** `EmbeddingClient::connect_default()` against the Python `weaver-embedder.service` — migration-window fallback. Retires alongside the gRPC client in Phase 3. 3. **`None`** — dedup degrades to unconditional writes (existing behavior when embedder unavailable). In-process construction failures fall through to gRPC (e.g., operator set the env var but snapshot is broken / GPU busy / OOM). ## 3. Feature wiring - New `embedder-rust = ["weaver-spu/flash-attn", "dep:candle-core"]` on `weaver-demo`. Mirrors the same-named feature on `weaver-interface`. - `weaver-interface/embedder-rust` now forwards `weaver-demo/embedder-rust` so when the daemon is built with `--features inference,embedder-rust`, the herobench dispatcher inside it gets the in-process path live. - Default builds keep the gRPC path. Operators opt in via `--features embedder-rust` + setting the env var. ## What this PR does NOT do - Drop the gRPC client. That's Phase 3 cleanup (#118 follow-up once the agent.yaml SPU schema lands and is the single source of truth for embedder placement). - Wire the agent.yaml SPU schema into the embedder construction. Block B′ work; the env-var path is a stepping stone. ## Test plan - [x] `cargo check -p weaver-demo` (default features) — clean - [x] `cargo check -p weaver-interface --features inference,embedder-rust` — clean (in-process path activated transitively) - [ ] CodeRabbit review clean - [ ] (manual, post-merge) Run herobench with `WEAVER_SPU_JINA_V4_SNAPSHOT=...`; confirm log shows "in-process EmbedderClient ready; dedup gate live" ## Follow-ups - Block B′ — agent.yaml SPU schema integration (replaces env var path) - Phase 3 — retire gRPC `EmbeddingClient` + Python service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds an optional in-process Rust embedder feature, changes the dedup API to accept a dyn Embedder, implements try_construct_embedder that prefers a Rust embedder and falls back to gRPC, and wires the trait-object embedder into the benchmark dedup pipeline with graceful degradation. ChangesIn-Process Rust Embedder with Trait-Based Abstraction
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/weaver-demo/src/herobench/benchmark.rs`:
- Around line 868-871: The code silently defaults gpu_ordinal to 0 on parse
failure of WEAVER_SPU_CUDA_DEVICE; change this to fail fast: read
std::env::var("WEAVER_SPU_CUDA_DEVICE") and if it is present attempt
s.parse::<usize>() and on Err emit a clear error (e.g., panic/expect or
processLogger.error + process::exit(1)) that includes the invalid string and the
env var name, only falling back to legacy 0 when the env var is completely
absent; update the logic around the gpu_ordinal binding in benchmark.rs to
perform this explicit presence check and parse-with-error handling instead of
using .unwrap_or(0).
- Around line 919-932: The gRPC connect/ready path using
weaver_spu::encoder::grpc_client_legacy::EmbeddingClient::connect_default() and
client.ensure_ready() must be bounded by a timeout so a wedged embedder or
stalled network doesn't hang startup; wrap the await calls with
tokio::time::timeout (choose a sensible Duration, e.g. a few seconds), handle
tokio::time::error::Elapsed by logging a warning and returning None, and keep
the existing error handling for other errors—apply this around both the
connect_default() future (or at least the ensure_ready() future) so failure to
complete within the timeout falls back to None instead of blocking forever.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5da06284-fb5d-4853-8054-a7405e54e592
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
crates/weaver-demo/Cargo.tomlcrates/weaver-demo/src/herobench/belief.rscrates/weaver-demo/src/herobench/benchmark.rscrates/weaver-interface/Cargo.toml
…review)
Two valid CR findings, both addressed:
## 1. Strict parse for WEAVER_SPU_CUDA_DEVICE
Previous: `.ok().and_then(|s| s.parse().ok()).unwrap_or(0)` swallowed
both "env unset" and "env set to garbage" into the default 0. A typo
(e.g. `WEAVER_SPU_CUDA_DEVICE=o`) would silently publish dedup work
on the wrong card.
Now uses an explicit match that distinguishes:
- `Err(NotPresent)` → default 0 (fine, that's the documented default)
- `Err(NotUnicode)` → log warn, fall back to gRPC path (no point
constructing in-process with an unparseable env var)
- `Ok(s)` + parse fails → log warn with the offending value, fall
back to gRPC
## 2. Bounded timeouts on the gRPC fallback path
Previous: bare `.await` on `EmbeddingClient::connect_default()` and
`client.ensure_ready()`. The gRPC client's `EmbeddingClientConfig`
defaults have a 10s connect timeout but a 300s request timeout —
plenty of room for `ensure_ready` to hang the whole benchmark startup
when the Python service is wedged.
Extracted the fallback into `try_construct_embedder_grpc()` so the
in-process and gRPC paths share it, then wrapped both
`connect_default()` and `ensure_ready()` futures in
`tokio::time::timeout(PROBE_TIMEOUT)` (10s — bounded enough to absorb
a cold-start Python embedder, short enough that a wedged service
doesn't stall benchmark start).
Timeout outcome → log warn → return `None` → dedup degrades to
unconditional writes (existing behavior on any other gRPC failure).
Validates: `cargo check -p weaver-demo` and
`cargo check -p weaver-interface --features inference,embedder-rust`
both clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Phase 1 task #118 — agent-side runtime cutover for the herobench dedup gate. Two layered changes:
Trait refactor (no behavior change). `belief::dedup_and_upsert_attempt_hypothesis` now takes `&dyn weaver_core::embedder::Embedder` instead of the concrete `&EmbeddingClient`. The benchmark.rs construction site mirrors with `Option<Arc>`. Future backend swaps don't touch the dedup logic.
Opt-in in-process backend behind `embedder-rust` feature + `WEAVER_SPU_JINA_V4_SNAPSHOT` env var. New `try_construct_embedder()` helper tries: (a) in-process `EmbedderClient::from_snapshot` → (b) legacy gRPC `EmbeddingClient::connect_default()` → (c) `None` (degrade to unconditional writes). In-process failures fall through to gRPC.
Why now, why this scope
Yesterday's survey (#117 evidence doc) confirmed the herobench embedder usage is dedup-only — graceful-degrades when down — so the swap is localized. The trait plumbing is the load-bearing change; the env-var path is a stepping stone toward the agent.yaml SPU schema (Block B′) once that lands. Drops the gRPC dep entirely in Phase 3.
Files
Behavior matrix
`WEAVER_SPU_CUDA_DEVICE` (default 0) overrides GPU ordinal — mirrors the `jina_embed` smoke binary's env contract.
Test plan
What this PR does NOT do
Follow-ups
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Refactor