fix(spu): kernel restore + Jina V4 trained-context cap (PR-1.H) by toddwbucy · Pull Request #297 · toddwbucy/WeaverTools

toddwbucy · 2026-05-08T15:39:47Z

Summary

Three small fixes that together unblock the in-process Rust embedder backend end-to-end. All discovered while completing task #117 — the first real `weaver serve --features inference,embedder-rust` smoke test against a Jina V4 snapshot.

Commit	Concern
`76c47ce`	Restore `crates/weaver-spu/kernels/transformer.cu` (1266 lines, deleted in PR-0.5.E #278). Without it, `--features cuda` builds fail to link. Also reverts the `build.rs` absent-file fallback added in PR #296 — dead code now.
`46d65d6`	Cap `JinaV4Embedder::max_seq_len` to a new `JINA_V4_TRAINED_CONTEXT = 32_768` constant. Snapshot's `config.json::max_position_embeddings` is 128000 (Qwen2.5-VL architecture max), but Jina V4 was trained on 32K — using more is OOD and produces degraded embeddings.
`59660d2`	Smoke-test evidence doc at `docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md`.

Why each fix is needed

1. Kernel restore

PR-0.5.E (#278) consolidated `weaver-inference` into `weaver-spu` but deleted `crates/weaver-inference/kernels/transformer.cu` without porting it across. `weaver-spu/build.rs` still expected it at `kernels/transformer.cu`, and the Rust FFI declarations at `crates/weaver-spu/src/core/gpu/forward.rs` still reference its symbols (`launch_allreduce_2gpu` etc.).

Result: any `cargo build --features cuda` (or transitively `--features inference`) fails at link with:

```
rust-lld: error: undefined symbol: launch_allreduce_2gpu
```

Masked on main because no CI builds with `--features inference` and the production daemon binary at `/opt/weaver/bin/weaver-infer` was built before PR-0.5.E.

PR #296 added a `build.rs` fallback that skipped the cuda compile when the file was missing — kept the build script from panicking, but didn't help the linker (the Rust FFI declarations still need defined symbols).

This PR restores the file from `git show 79bb649^:crates/weaver-inference/kernels/transformer.cu` and reverts the now-dead fallback.

2. `max_seq_len` cap

Smoke test caught a real identity drift between the Rust and Python backends:

```
Refuse to start: embedder identity drifted from pin at /opt/weaver/state/embedder.pin.json.
pinned: jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...)
live: jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...)
delta: max_seq_length: pin=32768 live=128000
```

The pin file (written by the Python service on 2026-04-20) records 32768. The Rust client was reading the architecture max (128000). Capping to `JINA_V4_TRAINED_CONTEXT = 32_768` makes the two backends report the same identity, so the Phase 1 cutover doesn't require an operator pin reset.

Bonus: `encode_text` boundary check now fires at the operator-meaningful limit instead of letting OOD inputs through.

3. Smoke evidence

Documents what was actually verified end-to-end. Notes the matters this run does not cover (actual embed traffic, long-running stability — task #118 territory).

Test plan

`cargo build -p weaver-interface --features inference,embedder-rust --release` — clean
End-to-end smoke per the evidence doc:
- `EmbedderClient::from_snapshot` constructs in 1.2 s
- `info()` returns `(model_name=jinaai/jina-embeddings-v4, dim=2048, max_seq=32768, rev=853c867...)`
- Cohort-pin probe matches `/opt/weaver/state/embedder.pin.json`
- Daemon listens on socket + TCP
- Daemon stays up until SIGTERM
CodeRabbit review clean

Follow-ups

feat(inference): ModelEntry family + mode + embedder fields (REG-02) #118 agent-side runtime cutover (replace `EmbeddingClient::connect_default()` in `weaver-demo/src/herobench/benchmark.rs` with the in-process `EmbedderClient`)
feat(interface): weaver model install + registry CLI (REG-04) #120 topology config 4 (NVLink physically removed) — operator-coordinated this afternoon
feat(inference): Jina V4 embedder registry + coresident-fleet validator (REG-05) #121 candle peer-access enable PR (upstream)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- GPU-accelerated transformer inference kernels added covering core LLM ops (norms, positional embeddings, attention, MLP, routing, and helpers).
- Jina V4 embedder enforces trained-context window limit of 32,768 tokens.
Bug Fixes
- Oversized-input error now references Jina V4’s trained-context limit.
Documentation
- Added smoke test evidence documenting embedder validation, run details, and follow-up notes.

PR-0.5.E (#278) deleted `crates/weaver-inference/kernels/transformer.cu` when it consolidated `weaver-inference` into `weaver-spu` but never ported the file across. Result: any `--features cuda` build (and transitively `--features inference`) fails to link with `undefined symbol: launch_allreduce_2gpu` (and ~5 other host-side launchers from `crates/weaver-spu/src/core/gpu/forward.rs`). This was masked on main because no CI runs `--features inference` and the production daemon binary at `/opt/weaver/bin/weaver-infer` was built before PR-0.5.E. PR #296 added a `build.rs` fallback that *skipped* the cuda compile when the file was missing — kept the build script from panicking, but didn't help the linker. Real fix: restore the kernel file from `git show 79bb649^:crates/ weaver-inference/kernels/transformer.cu` into `crates/weaver-spu/ kernels/transformer.cu` (1266 lines, unchanged). The cudarc decoder path links cleanly again. Also removed the now-dead `build.rs` fallback so the script reverts to its pre-PR-0.5.E shape. The cudarc path is still slated for full retirement in Phase 3 cleanup per Cargo.toml's legacy-backends comment. Until then, keeping it functional avoids cascading breakage in `serve.rs` etc. Validates: `cargo build -p weaver-interface --features inference,embedder-rust --release` — clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The smoke test for the in-process embedder backend (task #117) caught a real identity-drift bug: `JinaV4Embedder::max_seq_len` was reading `config.json::max_position_embeddings` directly, which on the canonical 853c867 snapshot is **128_000** (the Qwen2.5-VL architecture's RoPE-extended max). But Jina V4 was *trained* on 32K-token sequences; using more is OOD and produces degraded embeddings. The Python embedder service correctly reports 32768. First smoke run failed exactly here: Refuse to start: embedder identity drifted from pin at /opt/weaver/state/embedder.pin.json. pinned: jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...) live: jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...) delta: max_seq_length: pin=32768 live=128000 Error: embedder identity mismatch Fix: - New `pub const JINA_V4_TRAINED_CONTEXT: usize = 32_768` in `encoder/jina_v4.rs`, documenting why this cap exists. - `from_snapshot()` now sets: `max_seq_len = JINA_V4_TRAINED_CONTEXT.min(raw.max_position_embeddings)` The `min()` keeps the cap honest if a future snapshot ships `max_position_embeddings < 32768` (unlikely but cheap insurance). Outcomes: - `EmbedderInfo::max_seq_length` matches the Python service report. Cohort-pin guard stays consistent across backends — no operator pin reset required at the Phase 1 cutover. - `encode_text` boundary check still fires correctly, now at the operator-meaningful limit (32768) rather than the architecture max (128000). Catches OOD inputs before encoding instead of after. Validated end-to-end via the smoke test in `docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md` (separate commit). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Captures the first end-to-end verification that `weaver serve` constructs an in-process `EmbedderClient` against a real Jina V4 snapshot, the cohort-pin guard fires correctly through the new path, and the daemon listens. Documents: - The pre-flight fixes the run surfaced (kernel restore + `JINA_V4_TRAINED_CONTEXT` cap, both committed separately on this branch). - What the smoke test confirms (boot-time wiring through the new backend) vs. what it doesn't (actual embed traffic, long-running stability — those are task #118 territory). - Operator-facing migration note: with the `JINA_V4_TRAINED_CONTEXT` cap, the existing pin file's `max_seq_length=32768` keeps matching after the cutover. No pin reset needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-08T15:44:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 70520b90-c459-4530-b688-7c5d4c6f3724

📥 Commits

Reviewing files that changed from the base of the PR and between 9941b33 and 71a64b7.

📒 Files selected for processing (1)

crates/weaver-spu/src/encoder/jina_v4.rs

📝 Walkthrough

Walkthrough

This PR adds a comprehensive transformer CUDA kernel compilation unit and extern "C" launchers, makes build.rs always compile CUDA when the cuda feature is enabled, clamps Jina V4 embedder max sequence length to 32,768, and adds smoke-test documentation of the Rust embedder cutover.

Changes

Transformer CUDA Kernels and Jina V4 Context Limit

Layer / File(s)	Summary
Build Configuration `crates/weaver-spu/build.rs`	When `cuda` feature is enabled, unconditionally invokes `compile_cuda_kernels()` instead of checking for kernel file existence and emitting fallback warnings.
Transformer Inference Kernels `crates/weaver-spu/kernels/transformer.cu`	Implements many GPU kernels and extern "C" launchers for transformer blocks: normalization (RMSNorm w/wo scale), positional encoding (RoPE full/partial), gated activations (SwiGLU, GeGLU), embedding/bias/residual ops, naive causal attention, fused FlashAttention v2 tiled attention, decode-time (seq_len=1) attention, MoE utilities (gather, weighted scatter, zero, scale-add), scale/softcap kernels, and a 2-GPU allreduce.
Embedder Trained Context Limit `crates/weaver-spu/src/encoder/jina_v4.rs`	Adds `JINA_V4_TRAINED_CONTEXT=32768` constant and caps `JinaV4Embedder::from_snapshot` to `min(raw.max_position_embeddings, JINA_V4_TRAINED_CONTEXT)`, and updates overflow messaging and docs to reference the trained-context limit.
Smoke Test Documentation `docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md`	Documents a 2026-05-08 end-to-end smoke test of the in-process Rust embedder: environment, pre-flight fixes (kernel restore, build.rs cleanup, context cap), execution command and observed output, validated assertions, test gaps, artifacts, and follow-ups for agent/operator migration regarding the 32K cap.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

toddwbucy/WeaverTools#296: Both PRs modify crates/weaver-spu/build.rs CUDA-kernel compilation logic; #296 introduced a conditional/check that this PR removes in favor of unconditional compilation.
toddwbucy/WeaverTools#294: Also touches JinaV4 max-sequence handling and the embedder API surface; related to the trained-context and validation changes here.
toddwbucy/WeaverTools#283: Related earlier work on the Jina V4 encoder; this PR further modifies the embedder by adding the trained-context constant and capping max_seq_len.

Poem

🐇 Kernels wake where copper winds hum low,

RoPE and attention in tiled rows grow,
Jina learns her limit — thirty-two K,
Logs blink green; the Rust embedder finds its way,
Small rabbit cheers: "Build, test, and off we go!"

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly matches the PR's main objectives: it mentions kernel restoration and the Jina V4 trained-context cap as the primary fixes, which align with the substantial changes in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-spu/kernels/transformer.cu`:
- Around line 511-512: The FlashAttention fast path in forward.rs must be
guarded for head sizes that fit the kernel's static limits: update both call
sites of launch_flash_attention in forward.rs (the branches around
use_flash_attention at lines where launch_flash_attention is invoked) to only
select the FlashAttention path when head_dim <= 256 (so FA2_MAX_HALF_DIM=128
covers O_acc) and when computed smem_bytes does not exceed the device shared
memory (avoid head_dim > 128 causing smem >48KiB); mirror the check used in
forward_gemma4.rs (if layer.is_sliding && head_dim <= 256) by adding an explicit
head_dim bound (and optional smem_bytes sanity) before invoking
launch_flash_attention to prevent out-of-bounds accumulator indexing and failed
kernel launches.

In `@crates/weaver-spu/src/encoder/jina_v4.rs`:
- Around line 264-266: The overflow error message still refers to the raw config
max_position_embeddings even though self.max_seq_len is now capped using
JINA_V4_TRAINED_CONTEXT; update the rejection error text emitted where the code
checks sequence length (the path that compares input length against
self.max_seq_len) to reference "max_seq_length" or "trained context" (or similar
wording) instead of "max_position_embeddings" so it matches the exposed value in
EmbedderInfo and the capped limit computed from JINA_V4_TRAINED_CONTEXT and
raw.max_position_embeddings.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 937096ed-5918-40ef-a702-edc188b2bddc

📥 Commits

Reviewing files that changed from the base of the PR and between 6b57422 and 59660d2.

📒 Files selected for processing (4)

crates/weaver-spu/build.rs
crates/weaver-spu/kernels/transformer.cu
crates/weaver-spu/src/encoder/jina_v4.rs
docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md

coderabbitai · 2026-05-08T15:53:19Z

+#define FA2_MAX_HALF_DIM 128  // head_dim / 2, supports up to head_dim=256
+


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "FlashAttention call sites / any guard or fallback:" rg -n -C4 '\blaunch_flash_attention\b|\bcudaFuncSetAttribute\b|\blaunch_attention\b' crates/weaver-spu echo echo "Head-dimension assumptions referenced in the CUDA path:" rg -n -C3 '\b(FA2_MAX_HALF_DIM|head_dim|Gemma 4|partial_rotary_factor)\b' crates/weaver-spu

Repository: toddwbucy/WeaverTools

Length of output: 50381

🏁 Script executed:

sed -n '600,620p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 923

🏁 Script executed:

sed -n '1540,1560p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 945

🏁 Script executed:

sed -n '590,640p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 2312

🏁 Script executed:

sed -n '1530,1580p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 2403

🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/core/gpu/forward.rs | head -20

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/core/gpu/forward.rs -B5 -A5 | head -60

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n 'use_flash_attention' crates/weaver-spu/src/core/gpu/forward.rs | head -30

Repository: toddwbucy/WeaverTools

Length of output: 343

🏁 Script executed:

sed -n '170,190p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 963

🏁 Script executed:

sed -n '410,430p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 793

🏁 Script executed:

sed -n '1320,1340p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 759

🏁 Script executed:

rg -n 'forward\(' crates/weaver-spu/src/core/gpu/forward.rs | grep -v '^\s*//' | head -20

Repository: toddwbucy/WeaverTools

Length of output: 85

🏁 Script executed:

rg -n '::forward\(' crates/weaver-spu/src/core/gpu --no-heading -A2 -B2 | head -80

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n 'forward\(' crates/weaver-spu/src/core/gpu/ --type rust | grep -v '::forward' | head -20

Repository: toddwbucy/WeaverTools

Length of output: 127

🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/core/gpu/ --type rust -B10 | head -100

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n '::forward\(|forward\(' crates/weaver-spu/src/core/gpu/engine.rs -A5 -B2 | head -100

Repository: toddwbucy/WeaverTools

Length of output: 146

🏁 Script executed:

rg -n 'flash_attention' crates/weaver-spu/src/core/gpu/engine.rs | head -20

Repository: toddwbucy/WeaverTools

Length of output: 146

🏁 Script executed:

ls -la crates/weaver-spu/src/core/gpu/

Repository: toddwbucy/WeaverTools

Length of output: 609

🏁 Script executed:

rg -n 'forward\(.*use_flash_attention' crates/weaver-spu/src/core/gpu/ -A10 | head -60

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n 'gpu::forward' crates/weaver-spu/src/decoder/ -A3 -B3 | head -100

Repository: toddwbucy/WeaverTools

Length of output: 782

🏁 Script executed:

rg -n 'use_flash_attention' crates/weaver-spu/src/decoder/ --type rust -B5 -A2 | head -150

Repository: toddwbucy/WeaverTools

Length of output: 7436

🏁 Script executed:

rg -n 'load_with_grant\|load_with_options' crates/weaver-spu/src/decoder/ -B5 -A5 | head -120

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/ --type rust -B8 -A2 | head -150

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n 'features\.flash_attention\|use_flash_attention.*true\|use_flash_attention.*false' crates/weaver-spu/src/ --type rust -B3 -A3

Repository: toddwbucy/WeaverTools

Length of output: 47

🏁 Script executed:

rg -n 'flash_attention' crates/weaver-spu/src/ --type rust | head -40

Repository: toddwbucy/WeaverTools

Length of output: 3082

🏁 Script executed:

sed -n '290,310p' crates/weaver-spu/src/main.rs

Repository: toddwbucy/WeaverTools

Length of output: 817

🏁 Script executed:

sed -n '450,470p' crates/weaver-spu/src/main.rs

Repository: toddwbucy/WeaverTools

Length of output: 863

🏁 Script executed:

sed -n '120,135p' crates/weaver-spu/src/decoder/startup.rs

Repository: toddwbucy/WeaverTools

Length of output: 744

🏁 Script executed:

rg -n 'struct ModelConfig' crates/weaver-spu/src/decoder/ -A20

Repository: toddwbucy/WeaverTools

Length of output: 1575

🏁 Script executed:

rg -n 'head_dim\|head_dimension' crates/weaver-spu/src/decoder/ --type rust | head -30

Repository: toddwbucy/WeaverTools

Length of output: 47

Guard the FlashAttention fast path against larger head sizes in forward.rs.

launch_flash_attention in forward.rs (lines 609, 1548) accepts any head_dim without bounds checking. While forward_gemma4.rs properly guards with if layer.is_sliding && head_dim <= 256, the forward.rs path has no such protection. The kernel requires O_acc[FA2_MAX_HALF_DIM] (line 511, FA2_MAX_HALF_DIM=128) to cover only head_dim ≤ 256, and smem_bytes exceeds 48 KiB for head_dim > 128. For the unguarded forward.rs paths, this can fail to launch or index past accumulator bounds.

Verification note
`forward_gemma4.rs` at line 327 includes the necessary guard (`head_dim <= 256`), but `forward.rs` at lines 606 and 1546 only checks `use_flash_attention` flag without validating `head_dim`. The upstream engine initialization passes `use_flash_attention` as a boolean profile flag with no head_dim validation.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/weaver-spu/kernels/transformer.cu` around lines 511 - 512, The FlashAttention fast path in forward.rs must be guarded for head sizes that fit the kernel's static limits: update both call sites of launch_flash_attention in forward.rs (the branches around use_flash_attention at lines where launch_flash_attention is invoked) to only select the FlashAttention path when head_dim <= 256 (so FA2_MAX_HALF_DIM=128 covers O_acc) and when computed smem_bytes does not exceed the device shared memory (avoid head_dim > 128 causing smem >48KiB); mirror the check used in forward_gemma4.rs (if layer.is_sliding && head_dim <= 256) by adding an explicit head_dim bound (and optional smem_bytes sanity) before invoking launch_flash_attention to prevent out-of-bounds accumulator indexing and failed kernel launches.

… review) After capping `max_seq_len = JINA_V4_TRAINED_CONTEXT.min(...)`, the "input exceeds Jina V4's max_position_embeddings" error message no longer matches the value being checked — `self.max_seq_len` is the trained-context cap (32768), not the architecture max (128000). Reworded to "trained context (max_seq_length = N)" so operators see the same number `EmbedderInfo::max_seq_length` exposes. Skipped the FlashAttention head-dim bounds-check finding (forward.rs call sites of launch_flash_attention) — pre-existing in main, not introduced by PR #297. Cudarc decoder path is slated for Phase 3 retirement; adding multi-site head_dim bounds checks belongs in its own focused PR with proper test coverage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-spu/src/encoder/jina_v4.rs`:
- Around line 264-266: Update the rustdoc for the function max_seq_len() to
state that it returns the effective capped sequence length (the minimum of
JINA_V4_TRAINED_CONTEXT and raw.max_position_embeddings) rather than the raw
architectural ceiling; mention both JINA_V4_TRAINED_CONTEXT and
raw.max_position_embeddings in the doc so readers know the method applies the
cap and returns the capped value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 140817ed-d147-4a81-9ffe-b981189ae883

📥 Commits

Reviewing files that changed from the base of the PR and between 59660d2 and 9941b33.

📒 Files selected for processing (1)

crates/weaver-spu/src/encoder/jina_v4.rs

…capped value (PR #297 review) The getter's rustdoc said it returned the architectural ceiling read from `config.json::max_position_embeddings`. That was correct when the getter was added in PR #294 but became stale after the cap landed earlier in this PR. Now states explicitly that it returns `min(JINA_V4_TRAINED_CONTEXT, raw.max_position_embeddings)` and references both terms so a reader can see the relationship between the cap, the raw config field, and the value `EmbedderInfo` exposes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

r3d91ll and others added 3 commits May 8, 2026 10:38

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread crates/weaver-spu/src/encoder/jina_v4.rs

toddwbucy merged commit 9527a70 into main May 8, 2026
1 check passed

toddwbucy deleted the feat/spu-embedder-rust-smoke-fixes branch May 8, 2026 16:38

toddwbucy mentioned this pull request May 8, 2026

feat: retire Python embedder + gRPC backend (PR-1.J) #300

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(spu): kernel restore + Jina V4 trained-context cap (PR-1.H)#297

fix(spu): kernel restore + Jina V4 trained-context cap (PR-1.H)#297
toddwbucy merged 5 commits into
mainfrom
feat/spu-embedder-rust-smoke-fixes

toddwbucy commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		#define FA2_MAX_HALF_DIM 128 // head_dim / 2, supports up to head_dim=256

Conversation

toddwbucy commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why each fix is needed

1. Kernel restore

2. `max_seq_len` cap

3. Smoke evidence

Test plan

Follow-ups

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

toddwbucy commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading