Skip to content

feat(herobench): trait-typed embedder + opt-in in-process backend (PR-1.I)#298

Merged
toddwbucy merged 2 commits into
mainfrom
feat/herobench-embedder-trait-cutover
May 8, 2026
Merged

feat(herobench): trait-typed embedder + opt-in in-process backend (PR-1.I)#298
toddwbucy merged 2 commits into
mainfrom
feat/herobench-embedder-trait-cutover

Conversation

@toddwbucy
Copy link
Copy Markdown
Owner

@toddwbucy toddwbucy commented May 8, 2026

Summary

Phase 1 task #118 — agent-side runtime cutover for the herobench dedup gate. Two layered changes:

  1. Trait refactor (no behavior change). `belief::dedup_and_upsert_attempt_hypothesis` now takes `&dyn weaver_core::embedder::Embedder` instead of the concrete `&EmbeddingClient`. The benchmark.rs construction site mirrors with `Option<Arc>`. Future backend swaps don't touch the dedup logic.

  2. Opt-in in-process backend behind `embedder-rust` feature + `WEAVER_SPU_JINA_V4_SNAPSHOT` env var. New `try_construct_embedder()` helper tries: (a) in-process `EmbedderClient::from_snapshot` → (b) legacy gRPC `EmbeddingClient::connect_default()` → (c) `None` (degrade to unconditional writes). In-process failures fall through to gRPC.

Why now, why this scope

Yesterday's survey (#117 evidence doc) confirmed the herobench embedder usage is dedup-only — graceful-degrades when down — so the swap is localized. The trait plumbing is the load-bearing change; the env-var path is a stepping stone toward the agent.yaml SPU schema (Block B′) once that lands. Drops the gRPC dep entirely in Phase 3.

Files

File Change
`crates/weaver-demo/src/herobench/belief.rs` `dedup_and_upsert_attempt_hypothesis` signature: `&dyn Embedder`
`crates/weaver-demo/src/herobench/benchmark.rs` `Option<Arc>` variable, new `try_construct_embedder()` helper
`crates/weaver-demo/Cargo.toml` New `embedder-rust = ["weaver-spu/flash-attn", "dep:candle-core"]` feature + optional candle-core dep
`crates/weaver-interface/Cargo.toml` `embedder-rust` now forwards `weaver-demo/embedder-rust`

Behavior matrix

Build `WEAVER_SPU_JINA_V4_SNAPSHOT` Path used
Default (no features) any gRPC `EmbeddingClient` (or no-dedup)
`--features embedder-rust` unset / empty gRPC `EmbeddingClient` (or no-dedup)
`--features embedder-rust` set to a valid snapshot In-process `EmbedderClient` (fallback to gRPC on construction failure)

`WEAVER_SPU_CUDA_DEVICE` (default 0) overrides GPU ordinal — mirrors the `jina_embed` smoke binary's env contract.

Test plan

  • `cargo check -p weaver-demo` (default features) — clean
  • `cargo check -p weaver-interface --features inference,embedder-rust` — clean (in-process path activated via feature unification)
  • CodeRabbit review clean
  • (manual, post-merge) Run herobench with `WEAVER_SPU_JINA_V4_SNAPSHOT=...`; confirm log shows "in-process EmbedderClient ready; dedup gate live"

What this PR does NOT do

  • Drop the gRPC `EmbeddingClient`. That's Phase 3 cleanup once Block B′ lands and the agent.yaml SPU schema becomes the single source of truth for embedder placement.
  • Wire the agent.yaml SPU schema directly. The env var is a stepping stone.

Follow-ups

  • Block B′ — agent.yaml SPU schema integration (replaces the env-var path)
  • Phase 3 — retire gRPC `EmbeddingClient` + Python service entirely

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added an optional in-process Rust embedder feature for demo builds.
    • Feature now propagates so enabling embedder support across the interface and demo is simpler.
    • Continued support for CUDA device selection via environment configuration.
  • Refactor

    • Embedder backend abstracted to a common trait for pluggable implementations.
    • Benchmark runner prefers a local embedder, falls back to the legacy connector on timeout, and degrades gracefully if unavailable.

…-1.I)

Phase 1 task #118 — agent-side runtime cutover for the herobench
dedup gate. Two layered changes:

## 1. Trait refactor (no behavior change)

`belief::dedup_and_upsert_attempt_hypothesis` previously took the
concrete `&weaver_spu::encoder::grpc_client_legacy::EmbeddingClient`.
Now takes `&dyn weaver_core::embedder::Embedder`. The benchmark.rs
construction site mirrors this — variable type changed to
`Option<Arc<dyn Embedder>>`. Existing gRPC `EmbeddingClient` already
implements the trait so the swap is type-system-only.

This is the load-bearing plumbing change: future swaps of the
embedder backend never need to touch the dedup logic again.

## 2. Opt-in in-process backend via env var (feature-gated)

New helper `try_construct_embedder()` in benchmark.rs tries paths in
this order:

  1. **In-process** `EmbedderClient::from_snapshot(...)` if the
     `embedder-rust` feature is enabled AND `WEAVER_SPU_JINA_V4_SNAPSHOT`
     is set. GPU ordinal from `WEAVER_SPU_CUDA_DEVICE` (default 0),
     mirrors the `jina_embed` smoke binary's env contract. Construction
     runs under `tokio::task::spawn_blocking` since model load is
     sync + GPU-bound (seconds-long).
  2. **Legacy gRPC** `EmbeddingClient::connect_default()` against the
     Python `weaver-embedder.service` — migration-window fallback.
     Retires alongside the gRPC client in Phase 3.
  3. **`None`** — dedup degrades to unconditional writes (existing
     behavior when embedder unavailable).

In-process construction failures fall through to gRPC (e.g.,
operator set the env var but snapshot is broken / GPU busy / OOM).

## 3. Feature wiring

  - New `embedder-rust = ["weaver-spu/flash-attn", "dep:candle-core"]`
    on `weaver-demo`. Mirrors the same-named feature on `weaver-interface`.
  - `weaver-interface/embedder-rust` now forwards
    `weaver-demo/embedder-rust` so when the daemon is built with
    `--features inference,embedder-rust`, the herobench dispatcher
    inside it gets the in-process path live.
  - Default builds keep the gRPC path. Operators opt in via
    `--features embedder-rust` + setting the env var.

## What this PR does NOT do

  - Drop the gRPC client. That's Phase 3 cleanup (#118 follow-up
    once the agent.yaml SPU schema lands and is the single source
    of truth for embedder placement).
  - Wire the agent.yaml SPU schema into the embedder construction.
    Block B′ work; the env-var path is a stepping stone.

## Test plan

- [x] `cargo check -p weaver-demo` (default features) — clean
- [x] `cargo check -p weaver-interface --features inference,embedder-rust` — clean (in-process path activated transitively)
- [ ] CodeRabbit review clean
- [ ] (manual, post-merge) Run herobench with `WEAVER_SPU_JINA_V4_SNAPSHOT=...`; confirm log shows "in-process EmbedderClient ready; dedup gate live"

## Follow-ups

- Block B′ — agent.yaml SPU schema integration (replaces env var path)
- Phase 3 — retire gRPC `EmbeddingClient` + Python service

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1688c69c-18e3-4959-a290-655b7b304b25

📥 Commits

Reviewing files that changed from the base of the PR and between ed6cf40 and 5a57e81.

📒 Files selected for processing (1)
  • crates/weaver-demo/src/herobench/benchmark.rs

📝 Walkthrough

Walkthrough

Adds an optional in-process Rust embedder feature, changes the dedup API to accept a dyn Embedder, implements try_construct_embedder that prefers a Rust embedder and falls back to gRPC, and wires the trait-object embedder into the benchmark dedup pipeline with graceful degradation.

Changes

In-Process Rust Embedder with Trait-Based Abstraction

Layer / File(s) Summary
Feature and Dependency Configuration
crates/weaver-demo/Cargo.toml, crates/weaver-interface/Cargo.toml
Adds embedder-rust feature and documents it; adds optional candle-core dependency; weaver-interface forwards weaver-demo/embedder-rust.
Embedder Trait Abstraction
crates/weaver-demo/src/herobench/belief.rs
dedup_and_upsert_attempt_hypothesis signature changed to accept &dyn weaver_core::embedder::Embedder instead of the concrete gRPC client.
Embedder Construction with Fallback
crates/weaver-demo/src/herobench/benchmark.rs
Adds try_construct_embedder() that prefers an in-process Rust embedder (feature+env gated, spawn_blocking init, optional CUDA), falls back to gRPC connect+ensure_ready, and returns None on timeout/failure.
Benchmark Integration
crates/weaver-demo/src/herobench/benchmark.rs
run_benchmark uses try_construct_embedder() to get Option<Arc<dyn Embedder>> and passes emb.as_ref() into the dedup call so dedup runs against the trait-object or degrades when absent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • toddwbucy/WeaverTools#135: Modifies the same dedup function and benchmark integration; this PR extends it with trait-based abstraction and in-process embedder support.
  • toddwbucy/WeaverTools#294: Wires an in-process EmbeddingClient implementation used by the Rust embedder path; closely related to demo construction logic.
  • toddwbucy/WeaverTools#270: Related workspace/candle wiring that adds candle support and feature gating used by the new embedder-rust flag.

Poem

🐰 I hop through crates with cheerful zest,

trait-objects now put embedders to the test;
Candle's glow may warm the Rusty way,
but gRPC waits if snapshots stray.
Benchmarks hum — dedup keeps the rest!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective: implementing trait-typed embedder abstraction and adding opt-in in-process backend support for the herobench dedup gate.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-demo/src/herobench/benchmark.rs`:
- Around line 868-871: The code silently defaults gpu_ordinal to 0 on parse
failure of WEAVER_SPU_CUDA_DEVICE; change this to fail fast: read
std::env::var("WEAVER_SPU_CUDA_DEVICE") and if it is present attempt
s.parse::<usize>() and on Err emit a clear error (e.g., panic/expect or
processLogger.error + process::exit(1)) that includes the invalid string and the
env var name, only falling back to legacy 0 when the env var is completely
absent; update the logic around the gpu_ordinal binding in benchmark.rs to
perform this explicit presence check and parse-with-error handling instead of
using .unwrap_or(0).
- Around line 919-932: The gRPC connect/ready path using
weaver_spu::encoder::grpc_client_legacy::EmbeddingClient::connect_default() and
client.ensure_ready() must be bounded by a timeout so a wedged embedder or
stalled network doesn't hang startup; wrap the await calls with
tokio::time::timeout (choose a sensible Duration, e.g. a few seconds), handle
tokio::time::error::Elapsed by logging a warning and returning None, and keep
the existing error handling for other errors—apply this around both the
connect_default() future (or at least the ensure_ready() future) so failure to
complete within the timeout falls back to None instead of blocking forever.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5da06284-fb5d-4853-8054-a7405e54e592

📥 Commits

Reviewing files that changed from the base of the PR and between 9527a70 and ed6cf40.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • crates/weaver-demo/Cargo.toml
  • crates/weaver-demo/src/herobench/belief.rs
  • crates/weaver-demo/src/herobench/benchmark.rs
  • crates/weaver-interface/Cargo.toml

Comment thread crates/weaver-demo/src/herobench/benchmark.rs Outdated
Comment thread crates/weaver-demo/src/herobench/benchmark.rs Outdated
…review)

Two valid CR findings, both addressed:

## 1. Strict parse for WEAVER_SPU_CUDA_DEVICE

Previous: `.ok().and_then(|s| s.parse().ok()).unwrap_or(0)` swallowed
both "env unset" and "env set to garbage" into the default 0. A typo
(e.g. `WEAVER_SPU_CUDA_DEVICE=o`) would silently publish dedup work
on the wrong card.

Now uses an explicit match that distinguishes:
  - `Err(NotPresent)` → default 0 (fine, that's the documented default)
  - `Err(NotUnicode)` → log warn, fall back to gRPC path (no point
    constructing in-process with an unparseable env var)
  - `Ok(s)` + parse fails → log warn with the offending value, fall
    back to gRPC

## 2. Bounded timeouts on the gRPC fallback path

Previous: bare `.await` on `EmbeddingClient::connect_default()` and
`client.ensure_ready()`. The gRPC client's `EmbeddingClientConfig`
defaults have a 10s connect timeout but a 300s request timeout —
plenty of room for `ensure_ready` to hang the whole benchmark startup
when the Python service is wedged.

Extracted the fallback into `try_construct_embedder_grpc()` so the
in-process and gRPC paths share it, then wrapped both
`connect_default()` and `ensure_ready()` futures in
`tokio::time::timeout(PROBE_TIMEOUT)` (10s — bounded enough to absorb
a cold-start Python embedder, short enough that a wedged service
doesn't stall benchmark start).

Timeout outcome → log warn → return `None` → dedup degrades to
unconditional writes (existing behavior on any other gRPC failure).

Validates: `cargo check -p weaver-demo` and
`cargo check -p weaver-interface --features inference,embedder-rust`
both clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants