Skip to content

fix(spu): kernel restore + Jina V4 trained-context cap (PR-1.H)#297

Merged
toddwbucy merged 5 commits into
mainfrom
feat/spu-embedder-rust-smoke-fixes
May 8, 2026
Merged

fix(spu): kernel restore + Jina V4 trained-context cap (PR-1.H)#297
toddwbucy merged 5 commits into
mainfrom
feat/spu-embedder-rust-smoke-fixes

Conversation

@toddwbucy
Copy link
Copy Markdown
Owner

@toddwbucy toddwbucy commented May 8, 2026

Summary

Three small fixes that together unblock the in-process Rust embedder backend end-to-end. All discovered while completing task #117 — the first real `weaver serve --features inference,embedder-rust` smoke test against a Jina V4 snapshot.

Commit Concern
76c47ce Restore `crates/weaver-spu/kernels/transformer.cu` (1266 lines, deleted in PR-0.5.E #278). Without it, `--features cuda` builds fail to link. Also reverts the `build.rs` absent-file fallback added in PR #296 — dead code now.
46d65d6 Cap `JinaV4Embedder::max_seq_len` to a new `JINA_V4_TRAINED_CONTEXT = 32_768` constant. Snapshot's `config.json::max_position_embeddings` is 128000 (Qwen2.5-VL architecture max), but Jina V4 was trained on 32K — using more is OOD and produces degraded embeddings.
59660d2 Smoke-test evidence doc at `docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md`.

Why each fix is needed

1. Kernel restore

PR-0.5.E (#278) consolidated `weaver-inference` into `weaver-spu` but deleted `crates/weaver-inference/kernels/transformer.cu` without porting it across. `weaver-spu/build.rs` still expected it at `kernels/transformer.cu`, and the Rust FFI declarations at `crates/weaver-spu/src/core/gpu/forward.rs` still reference its symbols (`launch_allreduce_2gpu` etc.).

Result: any `cargo build --features cuda` (or transitively `--features inference`) fails at link with:

```
rust-lld: error: undefined symbol: launch_allreduce_2gpu
```

Masked on main because no CI builds with `--features inference` and the production daemon binary at `/opt/weaver/bin/weaver-infer` was built before PR-0.5.E.

PR #296 added a `build.rs` fallback that skipped the cuda compile when the file was missing — kept the build script from panicking, but didn't help the linker (the Rust FFI declarations still need defined symbols).

This PR restores the file from `git show 79bb649^:crates/weaver-inference/kernels/transformer.cu` and reverts the now-dead fallback.

2. `max_seq_len` cap

Smoke test caught a real identity drift between the Rust and Python backends:

```
Refuse to start: embedder identity drifted from pin at /opt/weaver/state/embedder.pin.json.
pinned: jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...)
live: jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...)
delta: max_seq_length: pin=32768 live=128000
```

The pin file (written by the Python service on 2026-04-20) records 32768. The Rust client was reading the architecture max (128000). Capping to `JINA_V4_TRAINED_CONTEXT = 32_768` makes the two backends report the same identity, so the Phase 1 cutover doesn't require an operator pin reset.

Bonus: `encode_text` boundary check now fires at the operator-meaningful limit instead of letting OOD inputs through.

3. Smoke evidence

Documents what was actually verified end-to-end. Notes the matters this run does not cover (actual embed traffic, long-running stability — task #118 territory).

Test plan

  • `cargo build -p weaver-interface --features inference,embedder-rust --release` — clean
  • End-to-end smoke per the evidence doc:
    • `EmbedderClient::from_snapshot` constructs in 1.2 s
    • `info()` returns `(model_name=jinaai/jina-embeddings-v4, dim=2048, max_seq=32768, rev=853c867...)`
    • Cohort-pin probe matches `/opt/weaver/state/embedder.pin.json`
    • Daemon listens on socket + TCP
    • Daemon stays up until SIGTERM
  • CodeRabbit review clean

Follow-ups

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • GPU-accelerated transformer inference kernels added covering core LLM ops (norms, positional embeddings, attention, MLP, routing, and helpers).
    • Jina V4 embedder enforces trained-context window limit of 32,768 tokens.
  • Bug Fixes

    • Oversized-input error now references Jina V4’s trained-context limit.
  • Documentation

    • Added smoke test evidence documenting embedder validation, run details, and follow-up notes.

r3d91ll and others added 3 commits May 8, 2026 10:38
PR-0.5.E (#278) deleted `crates/weaver-inference/kernels/transformer.cu`
when it consolidated `weaver-inference` into `weaver-spu` but never
ported the file across. Result: any `--features cuda` build (and
transitively `--features inference`) fails to link with
`undefined symbol: launch_allreduce_2gpu` (and ~5 other host-side
launchers from `crates/weaver-spu/src/core/gpu/forward.rs`).

This was masked on main because no CI runs `--features inference`
and the production daemon binary at `/opt/weaver/bin/weaver-infer`
was built before PR-0.5.E. PR #296 added a `build.rs` fallback that
*skipped* the cuda compile when the file was missing — kept the
build script from panicking, but didn't help the linker.

Real fix: restore the kernel file from `git show 79bb649^:crates/
weaver-inference/kernels/transformer.cu` into `crates/weaver-spu/
kernels/transformer.cu` (1266 lines, unchanged). The cudarc decoder
path links cleanly again. Also removed the now-dead `build.rs`
fallback so the script reverts to its pre-PR-0.5.E shape.

The cudarc path is still slated for full retirement in Phase 3
cleanup per Cargo.toml's legacy-backends comment. Until then,
keeping it functional avoids cascading breakage in `serve.rs` etc.

Validates: `cargo build -p weaver-interface --features inference,embedder-rust --release` — clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The smoke test for the in-process embedder backend (task #117) caught
a real identity-drift bug: `JinaV4Embedder::max_seq_len` was reading
`config.json::max_position_embeddings` directly, which on the
canonical 853c867 snapshot is **128_000** (the Qwen2.5-VL
architecture's RoPE-extended max). But Jina V4 was *trained* on
32K-token sequences; using more is OOD and produces degraded
embeddings. The Python embedder service correctly reports 32768.

First smoke run failed exactly here:

    Refuse to start: embedder identity drifted from pin at /opt/weaver/state/embedder.pin.json.
      pinned:  jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...)
      live:    jinaai/jina-embeddings-v4 (dim=2048, rev=853c867...)
      delta:   max_seq_length: pin=32768 live=128000
    Error: embedder identity mismatch

Fix:

  - New `pub const JINA_V4_TRAINED_CONTEXT: usize = 32_768` in
    `encoder/jina_v4.rs`, documenting why this cap exists.
  - `from_snapshot()` now sets:
      `max_seq_len = JINA_V4_TRAINED_CONTEXT.min(raw.max_position_embeddings)`
    The `min()` keeps the cap honest if a future snapshot ships
    `max_position_embeddings < 32768` (unlikely but cheap insurance).

Outcomes:

  - `EmbedderInfo::max_seq_length` matches the Python service report.
    Cohort-pin guard stays consistent across backends — no operator
    pin reset required at the Phase 1 cutover.
  - `encode_text` boundary check still fires correctly, now at the
    operator-meaningful limit (32768) rather than the architecture
    max (128000). Catches OOD inputs before encoding instead of after.

Validated end-to-end via the smoke test in
`docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md`
(separate commit).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the first end-to-end verification that `weaver serve`
constructs an in-process `EmbedderClient` against a real Jina V4
snapshot, the cohort-pin guard fires correctly through the new path,
and the daemon listens.

Documents:
  - The pre-flight fixes the run surfaced (kernel restore +
    `JINA_V4_TRAINED_CONTEXT` cap, both committed separately on this
    branch).
  - What the smoke test confirms (boot-time wiring through the new
    backend) vs. what it doesn't (actual embed traffic, long-running
    stability — those are task #118 territory).
  - Operator-facing migration note: with the `JINA_V4_TRAINED_CONTEXT`
    cap, the existing pin file's `max_seq_length=32768` keeps matching
    after the cutover. No pin reset needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 70520b90-c459-4530-b688-7c5d4c6f3724

📥 Commits

Reviewing files that changed from the base of the PR and between 9941b33 and 71a64b7.

📒 Files selected for processing (1)
  • crates/weaver-spu/src/encoder/jina_v4.rs

📝 Walkthrough

Walkthrough

This PR adds a comprehensive transformer CUDA kernel compilation unit and extern "C" launchers, makes build.rs always compile CUDA when the cuda feature is enabled, clamps Jina V4 embedder max sequence length to 32,768, and adds smoke-test documentation of the Rust embedder cutover.

Changes

Transformer CUDA Kernels and Jina V4 Context Limit

Layer / File(s) Summary
Build Configuration
crates/weaver-spu/build.rs
When cuda feature is enabled, unconditionally invokes compile_cuda_kernels() instead of checking for kernel file existence and emitting fallback warnings.
Transformer Inference Kernels
crates/weaver-spu/kernels/transformer.cu
Implements many GPU kernels and extern "C" launchers for transformer blocks: normalization (RMSNorm w/wo scale), positional encoding (RoPE full/partial), gated activations (SwiGLU, GeGLU), embedding/bias/residual ops, naive causal attention, fused FlashAttention v2 tiled attention, decode-time (seq_len=1) attention, MoE utilities (gather, weighted scatter, zero, scale-add), scale/softcap kernels, and a 2-GPU allreduce.
Embedder Trained Context Limit
crates/weaver-spu/src/encoder/jina_v4.rs
Adds JINA_V4_TRAINED_CONTEXT=32768 constant and caps JinaV4Embedder::from_snapshot to min(raw.max_position_embeddings, JINA_V4_TRAINED_CONTEXT), and updates overflow messaging and docs to reference the trained-context limit.
Smoke Test Documentation
docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md
Documents a 2026-05-08 end-to-end smoke test of the in-process Rust embedder: environment, pre-flight fixes (kernel restore, build.rs cleanup, context cap), execution command and observed output, validated assertions, test gaps, artifacts, and follow-ups for agent/operator migration regarding the 32K cap.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • toddwbucy/WeaverTools#296: Both PRs modify crates/weaver-spu/build.rs CUDA-kernel compilation logic; #296 introduced a conditional/check that this PR removes in favor of unconditional compilation.
  • toddwbucy/WeaverTools#294: Also touches JinaV4 max-sequence handling and the embedder API surface; related to the trained-context and validation changes here.
  • toddwbucy/WeaverTools#283: Related earlier work on the Jina V4 encoder; this PR further modifies the embedder by adding the trained-context constant and capping max_seq_len.

Poem

🐇 Kernels wake where copper winds hum low,

RoPE and attention in tiled rows grow,
Jina learns her limit — thirty-two K,
Logs blink green; the Rust embedder finds its way,
Small rabbit cheers: "Build, test, and off we go!"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the PR's main objectives: it mentions kernel restoration and the Jina V4 trained-context cap as the primary fixes, which align with the substantial changes in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-spu/kernels/transformer.cu`:
- Around line 511-512: The FlashAttention fast path in forward.rs must be
guarded for head sizes that fit the kernel's static limits: update both call
sites of launch_flash_attention in forward.rs (the branches around
use_flash_attention at lines where launch_flash_attention is invoked) to only
select the FlashAttention path when head_dim <= 256 (so FA2_MAX_HALF_DIM=128
covers O_acc) and when computed smem_bytes does not exceed the device shared
memory (avoid head_dim > 128 causing smem >48KiB); mirror the check used in
forward_gemma4.rs (if layer.is_sliding && head_dim <= 256) by adding an explicit
head_dim bound (and optional smem_bytes sanity) before invoking
launch_flash_attention to prevent out-of-bounds accumulator indexing and failed
kernel launches.

In `@crates/weaver-spu/src/encoder/jina_v4.rs`:
- Around line 264-266: The overflow error message still refers to the raw config
max_position_embeddings even though self.max_seq_len is now capped using
JINA_V4_TRAINED_CONTEXT; update the rejection error text emitted where the code
checks sequence length (the path that compares input length against
self.max_seq_len) to reference "max_seq_length" or "trained context" (or similar
wording) instead of "max_position_embeddings" so it matches the exposed value in
EmbedderInfo and the capped limit computed from JINA_V4_TRAINED_CONTEXT and
raw.max_position_embeddings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 937096ed-5918-40ef-a702-edc188b2bddc

📥 Commits

Reviewing files that changed from the base of the PR and between 6b57422 and 59660d2.

📒 Files selected for processing (4)
  • crates/weaver-spu/build.rs
  • crates/weaver-spu/kernels/transformer.cu
  • crates/weaver-spu/src/encoder/jina_v4.rs
  • docs/infrastructure/embedder-rust-cutover-evidence/smoke-2026-05-08.md

Comment on lines +511 to +512
#define FA2_MAX_HALF_DIM 128 // head_dim / 2, supports up to head_dim=256

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "FlashAttention call sites / any guard or fallback:"
rg -n -C4 '\blaunch_flash_attention\b|\bcudaFuncSetAttribute\b|\blaunch_attention\b' crates/weaver-spu

echo
echo "Head-dimension assumptions referenced in the CUDA path:"
rg -n -C3 '\b(FA2_MAX_HALF_DIM|head_dim|Gemma 4|partial_rotary_factor)\b' crates/weaver-spu

Repository: toddwbucy/WeaverTools

Length of output: 50381


🏁 Script executed:

sed -n '600,620p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 923


🏁 Script executed:

sed -n '1540,1560p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 945


🏁 Script executed:

sed -n '590,640p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 2312


🏁 Script executed:

sed -n '1530,1580p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 2403


🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/core/gpu/forward.rs | head -20

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/core/gpu/forward.rs -B5 -A5 | head -60

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n 'use_flash_attention' crates/weaver-spu/src/core/gpu/forward.rs | head -30

Repository: toddwbucy/WeaverTools

Length of output: 343


🏁 Script executed:

sed -n '170,190p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 963


🏁 Script executed:

sed -n '410,430p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 793


🏁 Script executed:

sed -n '1320,1340p' crates/weaver-spu/src/core/gpu/forward.rs

Repository: toddwbucy/WeaverTools

Length of output: 759


🏁 Script executed:

rg -n 'forward\(' crates/weaver-spu/src/core/gpu/forward.rs | grep -v '^\s*//' | head -20

Repository: toddwbucy/WeaverTools

Length of output: 85


🏁 Script executed:

rg -n '::forward\(' crates/weaver-spu/src/core/gpu --no-heading -A2 -B2 | head -80

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n 'forward\(' crates/weaver-spu/src/core/gpu/ --type rust | grep -v '::forward' | head -20

Repository: toddwbucy/WeaverTools

Length of output: 127


🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/core/gpu/ --type rust -B10 | head -100

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n '::forward\(|forward\(' crates/weaver-spu/src/core/gpu/engine.rs -A5 -B2 | head -100

Repository: toddwbucy/WeaverTools

Length of output: 146


🏁 Script executed:

rg -n 'flash_attention' crates/weaver-spu/src/core/gpu/engine.rs | head -20

Repository: toddwbucy/WeaverTools

Length of output: 146


🏁 Script executed:

ls -la crates/weaver-spu/src/core/gpu/

Repository: toddwbucy/WeaverTools

Length of output: 609


🏁 Script executed:

rg -n 'forward\(.*use_flash_attention' crates/weaver-spu/src/core/gpu/ -A10 | head -60

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n 'gpu::forward' crates/weaver-spu/src/decoder/ -A3 -B3 | head -100

Repository: toddwbucy/WeaverTools

Length of output: 782


🏁 Script executed:

rg -n 'use_flash_attention' crates/weaver-spu/src/decoder/ --type rust -B5 -A2 | head -150

Repository: toddwbucy/WeaverTools

Length of output: 7436


🏁 Script executed:

rg -n 'load_with_grant\|load_with_options' crates/weaver-spu/src/decoder/ -B5 -A5 | head -120

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n 'use_flash_attention\s*=' crates/weaver-spu/src/ --type rust -B8 -A2 | head -150

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n 'features\.flash_attention\|use_flash_attention.*true\|use_flash_attention.*false' crates/weaver-spu/src/ --type rust -B3 -A3

Repository: toddwbucy/WeaverTools

Length of output: 47


🏁 Script executed:

rg -n 'flash_attention' crates/weaver-spu/src/ --type rust | head -40

Repository: toddwbucy/WeaverTools

Length of output: 3082


🏁 Script executed:

sed -n '290,310p' crates/weaver-spu/src/main.rs

Repository: toddwbucy/WeaverTools

Length of output: 817


🏁 Script executed:

sed -n '450,470p' crates/weaver-spu/src/main.rs

Repository: toddwbucy/WeaverTools

Length of output: 863


🏁 Script executed:

sed -n '120,135p' crates/weaver-spu/src/decoder/startup.rs

Repository: toddwbucy/WeaverTools

Length of output: 744


🏁 Script executed:

rg -n 'struct ModelConfig' crates/weaver-spu/src/decoder/ -A20

Repository: toddwbucy/WeaverTools

Length of output: 1575


🏁 Script executed:

rg -n 'head_dim\|head_dimension' crates/weaver-spu/src/decoder/ --type rust | head -30

Repository: toddwbucy/WeaverTools

Length of output: 47


Guard the FlashAttention fast path against larger head sizes in forward.rs.

launch_flash_attention in forward.rs (lines 609, 1548) accepts any head_dim without bounds checking. While forward_gemma4.rs properly guards with if layer.is_sliding && head_dim <= 256, the forward.rs path has no such protection. The kernel requires O_acc[FA2_MAX_HALF_DIM] (line 511, FA2_MAX_HALF_DIM=128) to cover only head_dim ≤ 256, and smem_bytes exceeds 48 KiB for head_dim > 128. For the unguarded forward.rs paths, this can fail to launch or index past accumulator bounds.

Verification note `forward_gemma4.rs` at line 327 includes the necessary guard (`head_dim <= 256`), but `forward.rs` at lines 606 and 1546 only checks `use_flash_attention` flag without validating `head_dim`. The upstream engine initialization passes `use_flash_attention` as a boolean profile flag with no head_dim validation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/weaver-spu/kernels/transformer.cu` around lines 511 - 512, The
FlashAttention fast path in forward.rs must be guarded for head sizes that fit
the kernel's static limits: update both call sites of launch_flash_attention in
forward.rs (the branches around use_flash_attention at lines where
launch_flash_attention is invoked) to only select the FlashAttention path when
head_dim <= 256 (so FA2_MAX_HALF_DIM=128 covers O_acc) and when computed
smem_bytes does not exceed the device shared memory (avoid head_dim > 128
causing smem >48KiB); mirror the check used in forward_gemma4.rs (if
layer.is_sliding && head_dim <= 256) by adding an explicit head_dim bound (and
optional smem_bytes sanity) before invoking launch_flash_attention to prevent
out-of-bounds accumulator indexing and failed kernel launches.

… review)

After capping `max_seq_len = JINA_V4_TRAINED_CONTEXT.min(...)`, the
"input exceeds Jina V4's max_position_embeddings" error message no
longer matches the value being checked — `self.max_seq_len` is the
trained-context cap (32768), not the architecture max (128000).
Reworded to "trained context (max_seq_length = N)" so operators
see the same number `EmbedderInfo::max_seq_length` exposes.

Skipped the FlashAttention head-dim bounds-check finding (forward.rs
call sites of launch_flash_attention) — pre-existing in main, not
introduced by PR #297. Cudarc decoder path is slated for Phase 3
retirement; adding multi-site head_dim bounds checks belongs in its
own focused PR with proper test coverage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/weaver-spu/src/encoder/jina_v4.rs`:
- Around line 264-266: Update the rustdoc for the function max_seq_len() to
state that it returns the effective capped sequence length (the minimum of
JINA_V4_TRAINED_CONTEXT and raw.max_position_embeddings) rather than the raw
architectural ceiling; mention both JINA_V4_TRAINED_CONTEXT and
raw.max_position_embeddings in the doc so readers know the method applies the
cap and returns the capped value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 140817ed-d147-4a81-9ffe-b981189ae883

📥 Commits

Reviewing files that changed from the base of the PR and between 59660d2 and 9941b33.

📒 Files selected for processing (1)
  • crates/weaver-spu/src/encoder/jina_v4.rs

Comment thread crates/weaver-spu/src/encoder/jina_v4.rs
…capped value (PR #297 review)

The getter's rustdoc said it returned the architectural ceiling read
from `config.json::max_position_embeddings`. That was correct when
the getter was added in PR #294 but became stale after the cap landed
earlier in this PR. Now states explicitly that it returns
`min(JINA_V4_TRAINED_CONTEXT, raw.max_position_embeddings)` and
references both terms so a reader can see the relationship between
the cap, the raw config field, and the value `EmbedderInfo` exposes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@toddwbucy toddwbucy merged commit 9527a70 into main May 8, 2026
1 check passed
@toddwbucy toddwbucy deleted the feat/spu-embedder-rust-smoke-fixes branch May 8, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants