feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) by biswapanda · Pull Request #1574 · PrimeIntellect-ai/verifiers

biswapanda · 2026-06-09T00:16:37Z

Description

Adds a dynamo_chat renderer transport so the verifiers TITO (tokens-in/tokens-out) client can run multi-turn against NVIDIA Dynamo, alongside the existing vLLM TITO path. Previously the TITO client only spoke vLLM's surface (POST /v1/chat/completions/tokens + /tokenize); Dynamo serves neither route, so multi-turn TITO against Dynamo silently degraded to MITO from turn 2 onward.

Changes

types: add RendererTransport = Literal["vllm_generate", "dynamo_chat"] and ClientConfig.renderer_transport (default vllm_generate — the new path is opt-in).
renderer_client / token client: thread renderer_transport through to renderers.generate() and route by transport.
- vllm_generate (default): unchanged — POST /v1/chat/completions/tokens, bridge tokens via /tokenize.
- dynamo_chat: POST /v1/chat/completions with placeholder messages + nvext.token_data=prompt_ids; bridge tokens computed locally via the model's HF fast tokenizer (no /tokenize round-trip). Engine token IDs + logprobs come back under nvext.engine_data.
chat completions client: graft nvext.engine_data (engine token IDs + per-token logprobs) onto the OpenAI-shaped response when present and the vLLM-native fields are absent, keeping the rest of the pipeline transport-agnostic.
routed_experts contract: RoutedExpertsPayload gains dtype: NotRequired[Literal["uint8", "uint16", "int32"]] so the routed-experts buffer is self-describing (≤256 experts → uint8, larger → uint16/int32) instead of consumers assuming a fixed width; the JSON-gate sidecar stripper is bounded to the routed_experts object and made key-order robust.
Fix a normalize_for_comparison asymmetry so get_prompt_ids matches vf.Message-shaped input (drops None-valued keys).

Type of Change

New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (head ea53210). All review threads resolved.

Notes

Default behavior is unchanged (renderer_transport defaults to vllm_generate). Companion to PrimeIntellect-ai/renderers#79 and PrimeIntellect-ai/prime-rl#2737.

Note

Medium Risk
Changes multi-turn token stitching, inference request shapes, and response parsing for Dynamo backends; misaligned local vs server tokenization could still break TITO, but default vLLM behavior is unchanged.

Overview
Adds renderer_transport ("vllm" default, "dynamo" opt-in) on ClientConfig so TITO (openai_chat_completions_token) and RendererClient can target NVIDIA Dynamo without vLLM’s /chat/completions/tokens or /tokenize routes.

For renderer_transport="dynamo", the token client posts stitched prompts via nvext.token_data on /v1/chat/completions, requests nvext.extra_fields=["engine_data"], strips vLLM-only sampling keys, and computes bridge tokens locally with a cached HuggingFace tokenizer (renderer_model_name override supported). OpenAIChatCompletionsClient grafts nvext.engine_data (prompt/completion token IDs, logprobs, routed experts) onto the OpenAI-shaped response so parse_tokens stays unchanged, including synthesizing logprobs when the choice has empty content and dropping tokens when logprob lengths mismatch.

RoutedExpertsPayload gains optional dtype; strip_routed_experts_data and the routed-experts sidecar now handle varying JSON key order and attach blobs under choice or nvext/engine_data, raising if a blob was stripped but no container exists. TITO prefix matching drops None keys in message normalization so multi-turn stitching no longer falls back to MITO every turn after the first.

^{Reviewed by Cursor Bugbot for commit b658883. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add Dynamo renderer transport (TITO over Dynamo) to chat completions token client

Adds a renderer_transport field to ClientConfig (default "vllm") and a RendererTransport type alias in verifiers/types.py, allowing per-client selection of either "vllm" or "dynamo" transport.
When transport is "dynamo", OpenAIChatCompletionsTokenClient tokenizes locally via a cached HF fast tokenizer, posts to /v1/chat/completions with nvext.token_data containing prompt IDs, and strips vLLM-only sampling keys (return_token_ids, spaces_between_special_tokens, priority).
Adds _graft_engine_data helper to OpenAIChatCompletionsClient to read token IDs and logprobs from nvext.engine_data, synthesize missing logprobs.content, and widen routed_experts discovery to additional nvext paths.
Fixes strip_routed_experts_data to find routed_experts.data regardless of key order by bounding the search within the object span.
Fixes post_chat_completion_with_routed_experts_sidecar to reattach the routed experts memoryview for both vLLM and Dynamo response shapes, raising an error if no container is found.
Risk: parse_tokens now returns None when completion_logprobs length mismatches completion_token_ids, which is a new failure mode for misaligned responses.

^{Macroscope summarized b658883.}

…ansport

…tokens Dynamo's vLLM and SGLang backends emit engine-emitted token IDs and per-token logprobs under `response.nvext.engine_data` when the client opts in via `nvext.extra_fields=["engine_data"]` (PR #8119). The vLLM-native path uses non-standard top-level fields (`choices[0].token_ids`, `response.prompt_token_ids`). Add a small graft inside `from_native_response.parse_tokens` that copies the engine_data fields onto the OpenAI-shaped response when present and the top-level fields are absent. The rest of parse_tokens then reads via the standard SDK attribute path regardless of backend.

The verifiers TITO client previously only spoke vLLM's TITO surface (POST /v1/chat/completions/tokens with tokens=prompt_ids; bridge tokens via /tokenize). Dynamo serves neither route, so multi-turn TITO against Dynamo silently degraded to MITO every turn-2+. This teaches OpenAIChatCompletionsTokenClient to read ClientConfig.renderer_transport and route accordingly: * prime_vllm_generate (default): unchanged. POST /v1/chat/completions/tokens with tokens=prompt_ids; bridge tokens via /tokenize HTTP. Requires vLLM >= 0.20. * dynamo_chat_nvext: POST /v1/chat/completions with placeholder messages + nvext.token_data=prompt_ids. Bridge tokens are computed locally via the model's HF fast tokenizer (no /tokenize HTTP round-trip). Server returns engine-side token IDs and logprobs under nvext.engine_data (PR #8119 channel), parsed by the OpenAIChatCompletionsClient.from_native_response graft so the rest of the pipeline is transport-agnostic. Also fix the normalize_for_comparison asymmetry that caused get_prompt_ids to never match for vf.Message-shaped input (the form MultiTurnEnv produces after maybe_normalize_messages). Drop None-valued keys so model_dump's exhaustive view is equivalent to to_native_prompt's slimmer view.

…ken_ids (plan B3)

…rs.generate()

…ChatCompletion, scrub return_token_ids, forward sampling args, graft engine_data logprobs) + rename to dynamo_chat

… content-less; trim test comments

…p fixed allowlist) for vLLM-path parity

…prob length, tokenizer override, drop dead renderer field

…all paths)

…route dynamo TITO through routed-experts sidecar helper

…_pretrained must not block the event loop)

…er key-order robust

…ect; document dtype field

…chat comments

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 17c819b. Configure here.}

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated

Comment thread verifiers/types.py Outdated

This was referenced Jun 9, 2026

feat(client): add dynamo_chat transport + routed_experts to renderer generate PrimeIntellect-ai/renderers#79

Open

feat: dynamo inference backend integration PrimeIntellect-ai/prime-rl#2737

Open

biswapanda added 5 commits June 8, 2026 19:11

feat(types): add RendererTransport literal + ClientConfig.renderer_tr…

230384a

…ansport

feat(clients): graft top-level nvext.completion_token_ids + prompt_to…

f12bf63

…ken_ids (plan B3)

feat(clients): thread renderer_transport from ClientConfig to rendere…

ee3482a

…rs.generate()

biswapanda force-pushed the rl-sdk-4 branch from 68d8f48 to ee3482a Compare June 9, 2026 02:13

fix(clients): address PR review R1-R5 (guard transport kwarg, import …

3b58bf9

…ChatCompletion, scrub return_token_ids, forward sampling args, graft engine_data logprobs) + rename to dynamo_chat

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated

Comment thread verifiers/clients/openai_chat_completions_client.py

biswapanda added 2 commits June 9, 2026 00:41

fix(clients): graft engine_data logprobs even when choice logprobs is…

7a85b84

… content-less; trim test comments

fix(clients): dynamo_chat forwards full normalized sampling_args (dro…

7cbb603

…p fixed allowlist) for vLLM-path parity

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py

fix(clients): centralize Dynamo denylist scrub (MITO+TITO), guard log…

6b2dfbb

…prob length, tokenizer override, drop dead renderer field

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py

biswapanda added 2 commits June 9, 2026 01:31

fix(clients): enforce logprobs/ids length invariant in parse_tokens (…

9d260d3

…all paths)

fix(clients): centralize tokenizer override in _get_local_tokenizer; …

4aa48a4

…route dynamo TITO through routed-experts sidecar helper

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py

biswapanda added 4 commits June 9, 2026 03:11

fix(clients): load HF tokenizer inside worker thread (cache-miss from…

d713edc

…_pretrained must not block the event loop)

feat(types): add dtype to RoutedExpertsPayload contract

193c549

fix(routed_experts): tighten dtype to Literal and make sidecar stripp…

c30dad2

…er key-order robust

fix(routed_experts): bound sidecar stripper to the routed_experts obj…

ea53210

…ect; document dtype field

biswapanda changed the title ~~feat(clients): add dynamo_chat_nvext renderer transport for multi-turn TITO~~ feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) Jun 10, 2026

docs(clients): drop PR-number and branch/plan references from dynamo_…

b31ff2d

…chat comments

biswapanda mentioned this pull request Jun 11, 2026

feat(client): add Dynamo inference backend PrimeIntellect-ai/prime-rl#2773

Open

fix(dynamo): preserve token-data and routed experts sidecars

17c819b

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_client.py

Comment thread docs/reference.md

biswapanda added 2 commits June 12, 2026 02:14

fix(dynamo): retain routed experts and document transport

59a01fa

chore(client): rename renderer transport values

b658883

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574
biswapanda wants to merge 19 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biswapanda commented Jun 9, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Type of Change

Review

Notes

Add Dynamo renderer transport (TITO over Dynamo) to chat completions token client

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading