feat(RL): forward routed_experts_prompt_start via nvext #10562
feat(RL): forward routed_experts_prompt_start via nvext #10562biswapanda wants to merge 6 commits into
Conversation
… trims routing engine-side
WalkthroughThis PR adds support for a ChangesRouted Experts Prompt Start Parameter
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
lib/llm/src/preprocessor.rs (1)
316-321: ⚡ Quick winAdd a Rust passthrough assertion for
routed_experts_prompt_start.Line 316-321 correctly forwards the field, but there’s no Rust regression assertion for this new
nvextkey inbackend_extra_argstests. Extending the existing passthrough test would lock the contract and prevent silent drops in future refactors.Suggested test extension
#[test] fn test_backend_extra_args_preserves_nvext_and_sampling_extensions() { let request: NvCreateChatCompletionRequest = serde_json::from_value(serde_json::json!({ "model": "test-model", "messages": [{"role": "user", "content": "hi"}], "detokenize": false, "allowed_token_ids": [10, 11], "bad_words_token_ids": [[12, 13]], "nvext": { "cache_salt": "step_7", - "extra_fields": ["completion_token_ids"] + "extra_fields": ["completion_token_ids"], + "routed_experts_prompt_start": 5 } })) .unwrap(); let extra_args = OpenAIPreprocessor::backend_extra_args(&request).unwrap(); assert_eq!(extra_args["nvext"]["cache_salt"], "step_7"); + assert_eq!(extra_args["nvext"]["routed_experts_prompt_start"], 5); assert_eq!( extra_args["nvext"]["extra_fields"], serde_json::json!(["completion_token_ids"]) );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/preprocessor.rs` around lines 316 - 321, Extend the existing backend_extra_args passthrough test to assert that nvext.routed_experts_prompt_start is forwarded: locate the test that inspects nvext_passthrough/backend_extra_args (the backend_extra_args tests), add an assertion that the "routed_experts_prompt_start" key exists in the passthrough map and that its value matches the expected value you set in the test input, ensuring the test reads nvext.routed_experts_prompt_start and verifies nvext_passthrough contains serde_json::json!(expected) (or equivalent value comparison) to prevent future regressions.components/src/dynamo/vllm/tests/test_vllm_unit.py (1)
906-910: ⚡ Quick winMove test imports to module scope.
This test adds in-function imports; keep imports at module top for Python unit test additions.
Suggested diff
import pytest +from vllm.sampling_params import SamplingParams +from dynamo.vllm.handlers import build_sampling_params @@ def test_build_sampling_params_applies_nvext_routed_experts_prompt_start(): @@ - import pytest - from vllm.sampling_params import SamplingParams - - from dynamo.vllm.handlers import build_sampling_paramsAs per coding guidelines, “keep all imports at module top (no imports inside functions)” for Python unit test additions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/src/dynamo/vllm/tests/test_vllm_unit.py` around lines 906 - 910, The test contains in-function imports; move the imports for pytest, SamplingParams (from vllm.sampling_params) and build_sampling_params (from dynamo.vllm.handlers) to the module top so they are declared at file scope rather than inside the test function; update any references in the test to use those top-level imports and remove the redundant local import statements.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@components/src/dynamo/vllm/tests/test_vllm_unit.py`:
- Around line 906-910: The test contains in-function imports; move the imports
for pytest, SamplingParams (from vllm.sampling_params) and build_sampling_params
(from dynamo.vllm.handlers) to the module top so they are declared at file scope
rather than inside the test function; update any references in the test to use
those top-level imports and remove the redundant local import statements.
In `@lib/llm/src/preprocessor.rs`:
- Around line 316-321: Extend the existing backend_extra_args passthrough test
to assert that nvext.routed_experts_prompt_start is forwarded: locate the test
that inspects nvext_passthrough/backend_extra_args (the backend_extra_args
tests), add an assertion that the "routed_experts_prompt_start" key exists in
the passthrough map and that its value matches the expected value you set in the
test input, ensuring the test reads nvext.routed_experts_prompt_start and
verifies nvext_passthrough contains serde_json::json!(expected) (or equivalent
value comparison) to prevent future regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5f780311-3af3-44df-a984-ae91e3bb4768
📒 Files selected for processing (4)
components/src/dynamo/vllm/handlers.pycomponents/src/dynamo/vllm/tests/test_vllm_unit.pylib/llm/src/preprocessor.rslib/llm/src/protocols/openai/nvext.rs
…st imports to module scope
… vLLM chat processor path
…r both nvext shapes in tests
Description
Forward the RL routed-experts capture offset (
routed_experts_prompt_start) to the vLLM worker vianvext, so the engine trims the leading prompt rows from the returned routing tensor — instead of shipping full-sequence routing across the wire and trimming client-side.Changes
nvext.rs: addNvExt.routed_experts_prompt_start: Option<u32>(request field, mirroringcache_salt).preprocessor.rs: forward it innvext_passthrough_argsso it reaches the worker'sextra_args.nvext.handlers.py: inbuild_sampling_params, applynvext.routed_experts_prompt_startontoSamplingParams.routed_experts_prompt_start(validated non-negative; vLLM clamps the upper bound) so the engine trims the prompt rows and the worker serializes the trimmed routing with the correctstart.build_sampling_paramsapplies + clamps the nvext offset.Context
Completes the routed-experts-on-
dynamo_chatstory (companion to #10529 worker serialize and PrimeIntellect-ai/renderers#79). The renderer now sendsnvext.routed_experts_prompt_startand demotes its client-side trim to a back-compat fallback (no-op once the worker stampsstart > 0). Engine-side trimming avoids the full-sequence routing blob crossing the wire on MoE rollouts.Type of Change
Testing
build_sampling_paramsapply/clamp path.Summary by CodeRabbit
Release Notes
New Features
routed_experts_prompt_startparameter to configure mixture-of-experts (MoE) expert replay capture offset behavior in requests.Tests