feat(client): add Dynamo inference backend by biswapanda · Pull Request #2773 · PrimeIntellect-ai/prime-rl

biswapanda · 2026-06-11T17:04:07Z

Overview:

Adds NVIDIA Dynamo as an optional inference backend alongside the existing vLLM path. Controlled by a new ClientConfig.backend field ("vllm" | "dynamo"). Three self-contained changes: a pluggable AdminAPI abstraction, renderer_transport selection for the verifiers wire shape, and a Dynamo teacher-logprobs path for OPD training.

Details:

packages/prime-rl-configs/src/prime_rl/configs/shared.py

ClientConfig.backend: Literal["vllm", "dynamo"] — selects the AdminAPI implementation and verifiers wire shape. Default "vllm" is a no-op for existing configs.
ClientConfig.rl_base_url — optional override for the Dynamo RL worker discovery listener (GET /v1/rl/workers). When unset, the port is derived from DYN_RL_PORT (default 8001).

src/prime_rl/utils/client.py

AdminAPI Protocol + VLLMAdminAPI — extracts the existing vLLM admin paths (/pause, /resume, /update_weights, /load_lora_adapter, /init_broadcaster) into a typed protocol. VLLMAdminAPI methods go through a shared _admin_post helper that adds bounded per-attempt timeouts and tenacity retry on 5xx/transport errors (300 s for pause/resume, 720 s for weight updates).
DynamoAdminAPI — Dynamo worker admin over POST /engine/<method>: pause_generation, resume_generation, update_weights_from_disk / update_weights_from_distributed (filesystem vs NCCL paths), load_lora_adapter. Inherits health/model checks from VLLMAdminAPI.
setup_admin_api(client_config) — picks DynamoAdminAPI when backend="dynamo", VLLMAdminAPI otherwise.
discover_dynamo_admin_base_urls — resolves worker system URLs from GET /v1/rl/workers; falls back to port-replaced base_url when rl_base_url is unset.
setup_clients — sets renderer_transport="dynamo_chat" on all vf.ClientConfig objects when backend="dynamo", "vllm_generate" otherwise. Requires verifiers #1574 + renderers #79.

src/prime_rl/orchestrator/utils.py

Splits compute_teacher_logprobs into two paths dispatched on client_config.renderer_transport: _compute_teacher_logprobs_vllm (existing /inference/v1/generate path) and _compute_teacher_logprobs_dynamo (POST /v1/chat/completions with nvext.token_data + nvext.extra_fields=["prompt_logprobs"]).
_flatten_prompt_logprobs — shared flattener that handles both vLLM typed Logprob objects and Dynamos dict shape {logprob, rank?, decoded_token?}.

Where should the reviewer start?

src/prime_rl/utils/client.py — AdminAPI protocol (line ~32), DynamoAdminAPI class, setup_admin_api, and setup_clients renderer_transport selection. Core of the change.
src/prime_rl/orchestrator/utils.py — _compute_teacher_logprobs_dynamo and the compute_teacher_logprobs dispatcher. Note the placeholder messages field required by the Dynamo frontend even when nvext.token_data is set.
packages/prime-rl-configs/src/prime_rl/configs/shared.py — the two new ClientConfig fields; verify defaults are backward-compatible.

Related Issues:

Relates to verifiers #1574 — adds renderer_transport field to vf.ClientConfig
Relates to renderers #79 — adds dynamo_chat transport to renderers.generate()

Note

High Risk
Touches weight broadcast (NCCL vs disk), pause/resume around every update, and discovery-dependent admin URLs—misconfiguration can break training or leave engines paused.

Overview
Adds NVIDIA Dynamo as an optional inference backend via ClientConfig.backend (vllm default, dynamo) and optional rl_base_url for RL worker discovery.

Admin layer: vLLM admin HTTP calls are refactored behind an AdminAPI protocol (VLLMAdminAPI vs DynamoAdminAPI on POST /engine/*). Health, model listing, pause/resume, weight updates, LoRA load, and NCCL init all route through this abstraction; the orchestrator passes weight_broadcast.type into Dynamo for filesystem vs NCCL weight paths.

Client wiring: For backend=dynamo, rollouts use renderer_transport=dynamo_chat (nvext token wire); admin URLs come from GET /v1/rl/workers when admin_base_url is unset. Model checks can use OpenAI-compat base_url while admin hits system ports. Elastic pools propagate backend and pin per-pod Dynamo system URLs.

OPD / teacher logprobs: compute_teacher_logprobs dispatches on renderer_transport—existing vLLM /inference/v1/generate vs Dynamo chat completions with nvext.token_data and shared logprob flattening.

^{Reviewed by Cursor Bugbot for commit a90b642. Bugbot is set up for automated code reviews on this repo. Configure here.}

…er transport

…s request

…mo admin to worker system URL

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a90b642. Configure here.}

cursor · 2026-06-11T18:27:37Z

-
-            if self.base_model_name not in models:
+            models = await self._admin_api.list_models(admin_client)
+            if self.base_model_name not in [m.get("id") for m in models]:


Elastic model checks wrong host

High Severity

With backend="dynamo", elastic admin clients target the worker system server, but health and LoRA sync still call /v1/models on that client. Static pools use separate inference URLs for model listing; elastic does not, so servers may never become ready and adapter state can stay wrong.

Additional Locations (1)

src/prime_rl/utils/elastic.py#L283-L290

^{Reviewed by Cursor Bugbot for commit a90b642. Configure here.}

Fixed in 205787001. _check_server_health now skips the /v1/models check when backend="dynamo" — the admin client targets the worker system server (DYN_SYSTEM_PORT) which doesn't serve that route; model availability is implied by a healthy worker registered via Dynamo's hub. The health GET /health check is retained.

cursor · 2026-06-11T18:27:37Z

            api_key_var=self.client_config.api_key_var,
            headers=self.client_config.headers,
            headers_from_env=self.client_config.headers_from_env,
+            # Propagate backend (and RL discovery URL) so the admin client matches


Elastic ignores dynamo backend

High Severity

When rebuilding train/eval clients, elastic builds a fresh ClientConfig without copying backend. setup_clients therefore always sets renderer_transport to vllm_generate, so elastic pools configured with backend="dynamo" still speak the vLLM wire format to Dynamo inference endpoints.

^{Reviewed by Cursor Bugbot for commit a90b642. Configure here.}

Fixed in 205787001. _rebuild_clients now passes backend=self.client_config.backend when constructing the per-URL ClientConfig, so setup_clients correctly selects dynamo_chat renderer_transport for dynamo pools instead of always using vllm_generate.

cursor · 2026-06-11T18:27:37Z

+            # port. Falls back to the first discovered URL if no host match.
+            system_urls = await asyncio.to_thread(discover_dynamo_admin_base_urls, config)
+            match = next((u for u in system_urls if urlsplit(u).hostname == ip), None)
+            config = config.model_copy(update={"admin_base_url": [match] if match else system_urls[:1]})


Wrong worker admin fallback

Medium Severity

If RL discovery system_url hostnames do not equal the pod IP (common with DNS names), elastic pins every pod’s admin client to the first discovered worker URL. Engine admin calls such as LoRA load then hit one worker instead of the pod that serves that IP’s inference.

^{Reviewed by Cursor Bugbot for commit a90b642. Configure here.}

Fixed in 205787001. _create_admin_client now resolves the system URL by comparing both the raw hostname and its DNS-resolved IP against the pod IP, so the match works whether system_url carries an IP address or a DNS name. Falls back to the first discovered URL only if neither comparison matches.

biswapanda added 3 commits June 11, 2026 04:21

feat(client): add dynamo backend selector, DynamoAdminAPI, and render…

cea2fb9

…er transport

feat(orchestrator): compute teacher logprobs over dynamo nvext transport

5fa41bb

fix(orchestrator): send placeholder message in dynamo teacher-logprob…

1a25d08

…s request

biswapanda changed the title ~~Dynamo integration~~ feat(client): add Dynamo inference backend — AdminAPI, renderer transport, teacher logprobs Jun 11, 2026

cursor Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/prime_rl/utils/client.py

Comment thread src/prime_rl/utils/client.py

Comment thread src/prime_rl/utils/client.py

Comment thread src/prime_rl/utils/client.py

fix(dynamo): wire NCCL weight-broadcast path and harden admin timeouts

52f8703

cursor Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/prime_rl/utils/client.py

Comment thread src/prime_rl/utils/client.py

biswapanda mentioned this pull request Jun 11, 2026

feat: dynamo inference backend integration #2737

Open

1 task

biswapanda changed the title ~~feat(client): add Dynamo inference backend — AdminAPI, renderer transport, teacher logprobs~~ feat(client): add Dynamo inference backend Jun 11, 2026

fix(dynamo): pass env headers in admin discovery and pin elastic dyna…

a90b642

…mo admin to worker system URL

cursor Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(client): add Dynamo inference backend#2773

feat(client): add Dynamo inference backend#2773
biswapanda wants to merge 5 commits into
PrimeIntellect-ai:mainfrom
biswapanda:dynamo-integration

biswapanda commented Jun 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 11, 2026

Uh oh!

biswapanda Jun 11, 2026

Uh oh!

cursor Bot Jun 11, 2026

Uh oh!

biswapanda Jun 11, 2026

Uh oh!

cursor Bot Jun 11, 2026

Uh oh!

biswapanda Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biswapanda commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

Elastic model checks wrong host

Uh oh!

biswapanda Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

Elastic ignores dynamo backend

Uh oh!

biswapanda Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

Wrong worker admin fallback

Uh oh!

biswapanda Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

biswapanda commented Jun 11, 2026 •

edited by cursor Bot

Loading