Skip to content

feat(RL): forward routed_experts_prompt_start via nvext #10562

Open
biswapanda wants to merge 6 commits into
mainfrom
bis/routed-experts-prompt-start
Open

feat(RL): forward routed_experts_prompt_start via nvext #10562
biswapanda wants to merge 6 commits into
mainfrom
bis/routed-experts-prompt-start

Conversation

@biswapanda

@biswapanda biswapanda commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

Forward the RL routed-experts capture offset (routed_experts_prompt_start) to the vLLM worker via nvext, so the engine trims the leading prompt rows from the returned routing tensor — instead of shipping full-sequence routing across the wire and trimming client-side.

Changes

  • nvext.rs: add NvExt.routed_experts_prompt_start: Option<u32> (request field, mirroring cache_salt).
  • preprocessor.rs: forward it in nvext_passthrough_args so it reaches the worker's extra_args.nvext.
  • handlers.py: in build_sampling_params, apply nvext.routed_experts_prompt_start onto SamplingParams.routed_experts_prompt_start (validated non-negative; vLLM clamps the upper bound) so the engine trims the prompt rows and the worker serializes the trimmed routing with the correct start.
  • test: build_sampling_params applies + clamps the nvext offset.

Context

Completes the routed-experts-on-dynamo_chat story (companion to #10529 worker serialize and PrimeIntellect-ai/renderers#79). The renderer now sends nvext.routed_experts_prompt_start and demotes its client-side trim to a back-compat fallback (no-op once the worker stamps start > 0). Engine-side trimming avoids the full-sequence routing blob crossing the wire on MoE rollouts.

Type of Change

  • New feature (non-breaking; field is optional, ignored by backends without routed-experts capture)

Testing

  • Python unit test for the build_sampling_params apply/clamp path.
  • Rust frontend build verified via CI; live engine-side-trim e2e pending.

Open in Devin Review

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for routed_experts_prompt_start parameter to configure mixture-of-experts (MoE) expert replay capture offset behavior in requests.
  • Tests

    • Added unit test coverage for the new routed-experts parameter handling.

@biswapanda biswapanda requested review from a team as code owners June 11, 2026 00:05
@github-actions github-actions Bot added feat backend::vllm Relates to the vllm backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 11, 2026

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment thread components/src/dynamo/vllm/tests/test_vllm_unit.py Outdated
Comment thread components/src/dynamo/vllm/tests/test_vllm_unit.py Outdated
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR adds support for a routed_experts_prompt_start parameter that enables MoE expert replay capture offset configuration. The parameter is defined in the OpenAI protocol layer, forwarded through request preprocessing, and consumed by the vLLM handler with validation and test coverage.

Changes

Routed Experts Prompt Start Parameter

Layer / File(s) Summary
Protocol definition and request passthrough
lib/llm/src/protocols/openai/nvext.rs, lib/llm/src/preprocessor.rs
NvExt struct adds routed_experts_prompt_start: Option<u32> field; OpenAIPreprocessor::nvext_passthrough_args forwards the field to generated extra_args["nvext"] JSON.
vLLM handler integration and validation
components/src/dynamo/vllm/handlers.py, components/src/dynamo/vllm/tests/test_vllm_unit.py
build_sampling_params reads the parameter from nvext sources and assigns to SamplingParams.routed_experts_prompt_start; unit test verifies propagation and confirms negative values are clamped to 0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description covers changes, context, and testing, but the Related Issues section requires either a closed issue or explicit confirmation—neither is provided in the required format. Add either a 'Closes #XXXX' or 'Relates to #XXXX' statement, or explicitly check the 'Confirmed — no related issue' checkbox as required by the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title accurately describes the main change: forwarding the routed_experts_prompt_start field via nvext across multiple components (Rust protocols, preprocessor, and Python handlers).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
lib/llm/src/preprocessor.rs (1)

316-321: ⚡ Quick win

Add a Rust passthrough assertion for routed_experts_prompt_start.

Line 316-321 correctly forwards the field, but there’s no Rust regression assertion for this new nvext key in backend_extra_args tests. Extending the existing passthrough test would lock the contract and prevent silent drops in future refactors.

Suggested test extension
 #[test]
 fn test_backend_extra_args_preserves_nvext_and_sampling_extensions() {
     let request: NvCreateChatCompletionRequest = serde_json::from_value(serde_json::json!({
         "model": "test-model",
         "messages": [{"role": "user", "content": "hi"}],
         "detokenize": false,
         "allowed_token_ids": [10, 11],
         "bad_words_token_ids": [[12, 13]],
         "nvext": {
             "cache_salt": "step_7",
-            "extra_fields": ["completion_token_ids"]
+            "extra_fields": ["completion_token_ids"],
+            "routed_experts_prompt_start": 5
         }
     }))
     .unwrap();

     let extra_args = OpenAIPreprocessor::backend_extra_args(&request).unwrap();

     assert_eq!(extra_args["nvext"]["cache_salt"], "step_7");
+    assert_eq!(extra_args["nvext"]["routed_experts_prompt_start"], 5);
     assert_eq!(
         extra_args["nvext"]["extra_fields"],
         serde_json::json!(["completion_token_ids"])
     );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/preprocessor.rs` around lines 316 - 321, Extend the existing
backend_extra_args passthrough test to assert that
nvext.routed_experts_prompt_start is forwarded: locate the test that inspects
nvext_passthrough/backend_extra_args (the backend_extra_args tests), add an
assertion that the "routed_experts_prompt_start" key exists in the passthrough
map and that its value matches the expected value you set in the test input,
ensuring the test reads nvext.routed_experts_prompt_start and verifies
nvext_passthrough contains serde_json::json!(expected) (or equivalent value
comparison) to prevent future regressions.
components/src/dynamo/vllm/tests/test_vllm_unit.py (1)

906-910: ⚡ Quick win

Move test imports to module scope.

This test adds in-function imports; keep imports at module top for Python unit test additions.

Suggested diff
 import pytest
+from vllm.sampling_params import SamplingParams
+from dynamo.vllm.handlers import build_sampling_params
@@
 def test_build_sampling_params_applies_nvext_routed_experts_prompt_start():
@@
-    import pytest
-    from vllm.sampling_params import SamplingParams
-
-    from dynamo.vllm.handlers import build_sampling_params

As per coding guidelines, “keep all imports at module top (no imports inside functions)” for Python unit test additions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/src/dynamo/vllm/tests/test_vllm_unit.py` around lines 906 - 910,
The test contains in-function imports; move the imports for pytest,
SamplingParams (from vllm.sampling_params) and build_sampling_params (from
dynamo.vllm.handlers) to the module top so they are declared at file scope
rather than inside the test function; update any references in the test to use
those top-level imports and remove the redundant local import statements.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@components/src/dynamo/vllm/tests/test_vllm_unit.py`:
- Around line 906-910: The test contains in-function imports; move the imports
for pytest, SamplingParams (from vllm.sampling_params) and build_sampling_params
(from dynamo.vllm.handlers) to the module top so they are declared at file scope
rather than inside the test function; update any references in the test to use
those top-level imports and remove the redundant local import statements.

In `@lib/llm/src/preprocessor.rs`:
- Around line 316-321: Extend the existing backend_extra_args passthrough test
to assert that nvext.routed_experts_prompt_start is forwarded: locate the test
that inspects nvext_passthrough/backend_extra_args (the backend_extra_args
tests), add an assertion that the "routed_experts_prompt_start" key exists in
the passthrough map and that its value matches the expected value you set in the
test input, ensuring the test reads nvext.routed_experts_prompt_start and
verifies nvext_passthrough contains serde_json::json!(expected) (or equivalent
value comparison) to prevent future regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5f780311-3af3-44df-a984-ae91e3bb4768

📥 Commits

Reviewing files that changed from the base of the PR and between 722e720 and e23aa10.

📒 Files selected for processing (4)
  • components/src/dynamo/vllm/handlers.py
  • components/src/dynamo/vllm/tests/test_vllm_unit.py
  • lib/llm/src/preprocessor.rs
  • lib/llm/src/protocols/openai/nvext.rs

@biswapanda biswapanda enabled auto-merge (squash) June 11, 2026 02:35
@biswapanda biswapanda changed the title feat(rl): forward routed_experts_prompt_start via nvext (engine-side routing trim) feat(rl): forward routed_experts_prompt_start via nvext Jun 11, 2026
@biswapanda biswapanda changed the title feat(rl): forward routed_experts_prompt_start via nvext feat(RL): forward routed_experts_prompt_start via nvext Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant