Skip to content

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574

Open
biswapanda wants to merge 19 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4
Open

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574
biswapanda wants to merge 19 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

Conversation

@biswapanda

@biswapanda biswapanda commented Jun 9, 2026

Copy link
Copy Markdown

Description

Adds a dynamo_chat renderer transport so the verifiers TITO (tokens-in/tokens-out) client can run multi-turn against NVIDIA Dynamo, alongside the existing vLLM TITO path. Previously the TITO client only spoke vLLM's surface (POST /v1/chat/completions/tokens + /tokenize); Dynamo serves neither route, so multi-turn TITO against Dynamo silently degraded to MITO from turn 2 onward.

Changes

  • types: add RendererTransport = Literal["vllm_generate", "dynamo_chat"] and ClientConfig.renderer_transport (default vllm_generate — the new path is opt-in).
  • renderer_client / token client: thread renderer_transport through to renderers.generate() and route by transport.
    • vllm_generate (default): unchanged — POST /v1/chat/completions/tokens, bridge tokens via /tokenize.
    • dynamo_chat: POST /v1/chat/completions with placeholder messages + nvext.token_data=prompt_ids; bridge tokens computed locally via the model's HF fast tokenizer (no /tokenize round-trip). Engine token IDs + logprobs come back under nvext.engine_data.
  • chat completions client: graft nvext.engine_data (engine token IDs + per-token logprobs) onto the OpenAI-shaped response when present and the vLLM-native fields are absent, keeping the rest of the pipeline transport-agnostic.
  • routed_experts contract: RoutedExpertsPayload gains dtype: NotRequired[Literal["uint8", "uint16", "int32"]] so the routed-experts buffer is self-describing (≤256 experts → uint8, larger → uint16/int32) instead of consumers assuming a fixed width; the JSON-gate sidecar stripper is bounded to the routed_experts object and made key-order robust.
  • Fix a normalize_for_comparison asymmetry so get_prompt_ids matches vf.Message-shaped input (drops None-valued keys).

Type of Change

  • New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (head ea53210). All review threads resolved.

Notes

Default behavior is unchanged (renderer_transport defaults to vllm_generate). Companion to PrimeIntellect-ai/renderers#79 and PrimeIntellect-ai/prime-rl#2737.


Note

Medium Risk
Changes multi-turn token stitching, inference request shapes, and response parsing for Dynamo backends; misaligned local vs server tokenization could still break TITO, but default vLLM behavior is unchanged.

Overview
Adds renderer_transport ("vllm" default, "dynamo" opt-in) on ClientConfig so TITO (openai_chat_completions_token) and RendererClient can target NVIDIA Dynamo without vLLM’s /chat/completions/tokens or /tokenize routes.

For renderer_transport="dynamo", the token client posts stitched prompts via nvext.token_data on /v1/chat/completions, requests nvext.extra_fields=["engine_data"], strips vLLM-only sampling keys, and computes bridge tokens locally with a cached HuggingFace tokenizer (renderer_model_name override supported). OpenAIChatCompletionsClient grafts nvext.engine_data (prompt/completion token IDs, logprobs, routed experts) onto the OpenAI-shaped response so parse_tokens stays unchanged, including synthesizing logprobs when the choice has empty content and dropping tokens when logprob lengths mismatch.

RoutedExpertsPayload gains optional dtype; strip_routed_experts_data and the routed-experts sidecar now handle varying JSON key order and attach blobs under choice or nvext/engine_data, raising if a blob was stripped but no container exists. TITO prefix matching drops None keys in message normalization so multi-turn stitching no longer falls back to MITO every turn after the first.

Reviewed by Cursor Bugbot for commit b658883. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add Dynamo renderer transport (TITO over Dynamo) to chat completions token client

  • Adds a renderer_transport field to ClientConfig (default "vllm") and a RendererTransport type alias in verifiers/types.py, allowing per-client selection of either "vllm" or "dynamo" transport.
  • When transport is "dynamo", OpenAIChatCompletionsTokenClient tokenizes locally via a cached HF fast tokenizer, posts to /v1/chat/completions with nvext.token_data containing prompt IDs, and strips vLLM-only sampling keys (return_token_ids, spaces_between_special_tokens, priority).
  • Adds _graft_engine_data helper to OpenAIChatCompletionsClient to read token IDs and logprobs from nvext.engine_data, synthesize missing logprobs.content, and widen routed_experts discovery to additional nvext paths.
  • Fixes strip_routed_experts_data to find routed_experts.data regardless of key order by bounding the search within the object span.
  • Fixes post_chat_completion_with_routed_experts_sidecar to reattach the routed experts memoryview for both vLLM and Dynamo response shapes, raising an error if no container is found.
  • Risk: parse_tokens now returns None when completion_logprobs length mismatches completion_token_ids, which is a new failure mode for misaligned responses.

Macroscope summarized b658883.

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated
Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated
Comment thread verifiers/types.py Outdated
…tokens

Dynamo's vLLM and SGLang backends emit engine-emitted token IDs and per-token
logprobs under `response.nvext.engine_data` when the client opts in via
`nvext.extra_fields=["engine_data"]` (PR #8119). The vLLM-native path uses
non-standard top-level fields (`choices[0].token_ids`, `response.prompt_token_ids`).

Add a small graft inside `from_native_response.parse_tokens` that copies the
engine_data fields onto the OpenAI-shaped response when present and the
top-level fields are absent. The rest of parse_tokens then reads via the
standard SDK attribute path regardless of backend.
The verifiers TITO client previously only spoke vLLM's TITO surface
(POST /v1/chat/completions/tokens with tokens=prompt_ids; bridge tokens
via /tokenize). Dynamo serves neither route, so multi-turn TITO against
Dynamo silently degraded to MITO every turn-2+.

This teaches OpenAIChatCompletionsTokenClient to read
ClientConfig.renderer_transport and route accordingly:

  * prime_vllm_generate (default): unchanged. POST /v1/chat/completions/tokens
    with tokens=prompt_ids; bridge tokens via /tokenize HTTP. Requires vLLM
    >= 0.20.

  * dynamo_chat_nvext: POST /v1/chat/completions with placeholder messages +
    nvext.token_data=prompt_ids. Bridge tokens are computed locally via the
    model's HF fast tokenizer (no /tokenize HTTP round-trip). Server returns
    engine-side token IDs and logprobs under nvext.engine_data (PR #8119
    channel), parsed by the OpenAIChatCompletionsClient.from_native_response
    graft so the rest of the pipeline is transport-agnostic.

Also fix the normalize_for_comparison asymmetry that caused get_prompt_ids
to never match for vf.Message-shaped input (the form MultiTurnEnv produces
after maybe_normalize_messages). Drop None-valued keys so model_dump's
exhaustive view is equivalent to to_native_prompt's slimmer view.
…ChatCompletion, scrub return_token_ids, forward sampling args, graft engine_data logprobs) + rename to dynamo_chat
Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated
Comment thread verifiers/clients/openai_chat_completions_client.py
Comment thread verifiers/clients/openai_chat_completions_token_client.py
…prob length, tokenizer override, drop dead renderer field
Comment thread verifiers/clients/openai_chat_completions_token_client.py
Comment thread verifiers/clients/openai_chat_completions_token_client.py
@biswapanda biswapanda changed the title feat(clients): add dynamo_chat_nvext renderer transport for multi-turn TITO feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) Jun 10, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 17c819b. Configure here.

Comment thread verifiers/clients/openai_chat_completions_client.py
Comment thread docs/reference.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant