Skip to content

feat(client): add Dynamo inference backend#2773

Open
biswapanda wants to merge 5 commits into
PrimeIntellect-ai:mainfrom
biswapanda:dynamo-integration
Open

feat(client): add Dynamo inference backend#2773
biswapanda wants to merge 5 commits into
PrimeIntellect-ai:mainfrom
biswapanda:dynamo-integration

Conversation

@biswapanda

@biswapanda biswapanda commented Jun 11, 2026

Copy link
Copy Markdown

Overview:

Adds NVIDIA Dynamo as an optional inference backend alongside the existing vLLM path. Controlled by a new ClientConfig.backend field ("vllm" | "dynamo"). Three self-contained changes: a pluggable AdminAPI abstraction, renderer_transport selection for the verifiers wire shape, and a Dynamo teacher-logprobs path for OPD training.

Details:

packages/prime-rl-configs/src/prime_rl/configs/shared.py

  • ClientConfig.backend: Literal["vllm", "dynamo"] — selects the AdminAPI implementation and verifiers wire shape. Default "vllm" is a no-op for existing configs.
  • ClientConfig.rl_base_url — optional override for the Dynamo RL worker discovery listener (GET /v1/rl/workers). When unset, the port is derived from DYN_RL_PORT (default 8001).

src/prime_rl/utils/client.py

  • AdminAPI Protocol + VLLMAdminAPI — extracts the existing vLLM admin paths (/pause, /resume, /update_weights, /load_lora_adapter, /init_broadcaster) into a typed protocol. VLLMAdminAPI methods go through a shared _admin_post helper that adds bounded per-attempt timeouts and tenacity retry on 5xx/transport errors (300 s for pause/resume, 720 s for weight updates).
  • DynamoAdminAPI — Dynamo worker admin over POST /engine/<method>: pause_generation, resume_generation, update_weights_from_disk / update_weights_from_distributed (filesystem vs NCCL paths), load_lora_adapter. Inherits health/model checks from VLLMAdminAPI.
  • setup_admin_api(client_config) — picks DynamoAdminAPI when backend="dynamo", VLLMAdminAPI otherwise.
  • discover_dynamo_admin_base_urls — resolves worker system URLs from GET /v1/rl/workers; falls back to port-replaced base_url when rl_base_url is unset.
  • setup_clients — sets renderer_transport="dynamo_chat" on all vf.ClientConfig objects when backend="dynamo", "vllm_generate" otherwise. Requires verifiers #1574 + renderers #79.

src/prime_rl/orchestrator/utils.py

  • Splits compute_teacher_logprobs into two paths dispatched on client_config.renderer_transport: _compute_teacher_logprobs_vllm (existing /inference/v1/generate path) and _compute_teacher_logprobs_dynamo (POST /v1/chat/completions with nvext.token_data + nvext.extra_fields=["prompt_logprobs"]).
  • _flatten_prompt_logprobs — shared flattener that handles both vLLM typed Logprob objects and Dynamos dict shape {logprob, rank?, decoded_token?}.

Where should the reviewer start?

  1. src/prime_rl/utils/client.pyAdminAPI protocol (line ~32), DynamoAdminAPI class, setup_admin_api, and setup_clients renderer_transport selection. Core of the change.
  2. src/prime_rl/orchestrator/utils.py_compute_teacher_logprobs_dynamo and the compute_teacher_logprobs dispatcher. Note the placeholder messages field required by the Dynamo frontend even when nvext.token_data is set.
  3. packages/prime-rl-configs/src/prime_rl/configs/shared.py — the two new ClientConfig fields; verify defaults are backward-compatible.

Related Issues:

  • Relates to verifiers #1574 — adds renderer_transport field to vf.ClientConfig
  • Relates to renderers #79 — adds dynamo_chat transport to renderers.generate()

Note

High Risk
Touches weight broadcast (NCCL vs disk), pause/resume around every update, and discovery-dependent admin URLs—misconfiguration can break training or leave engines paused.

Overview
Adds NVIDIA Dynamo as an optional inference backend via ClientConfig.backend (vllm default, dynamo) and optional rl_base_url for RL worker discovery.

Admin layer: vLLM admin HTTP calls are refactored behind an AdminAPI protocol (VLLMAdminAPI vs DynamoAdminAPI on POST /engine/*). Health, model listing, pause/resume, weight updates, LoRA load, and NCCL init all route through this abstraction; the orchestrator passes weight_broadcast.type into Dynamo for filesystem vs NCCL weight paths.

Client wiring: For backend=dynamo, rollouts use renderer_transport=dynamo_chat (nvext token wire); admin URLs come from GET /v1/rl/workers when admin_base_url is unset. Model checks can use OpenAI-compat base_url while admin hits system ports. Elastic pools propagate backend and pin per-pod Dynamo system URLs.

OPD / teacher logprobs: compute_teacher_logprobs dispatches on renderer_transport—existing vLLM /inference/v1/generate vs Dynamo chat completions with nvext.token_data and shared logprob flattening.

Reviewed by Cursor Bugbot for commit a90b642. Bugbot is set up for automated code reviews on this repo. Configure here.

@biswapanda biswapanda changed the title Dynamo integration feat(client): add Dynamo inference backend — AdminAPI, renderer transport, teacher logprobs Jun 11, 2026
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
Comment thread src/prime_rl/utils/client.py
@biswapanda biswapanda changed the title feat(client): add Dynamo inference backend — AdminAPI, renderer transport, teacher logprobs feat(client): add Dynamo inference backend Jun 11, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a90b642. Configure here.


if self.base_model_name not in models:
models = await self._admin_api.list_models(admin_client)
if self.base_model_name not in [m.get("id") for m in models]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elastic model checks wrong host

High Severity

With backend="dynamo", elastic admin clients target the worker system server, but health and LoRA sync still call /v1/models on that client. Static pools use separate inference URLs for model listing; elastic does not, so servers may never become ready and adapter state can stay wrong.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a90b642. Configure here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 205787001. _check_server_health now skips the /v1/models check when backend="dynamo" — the admin client targets the worker system server (DYN_SYSTEM_PORT) which doesn't serve that route; model availability is implied by a healthy worker registered via Dynamo's hub. The health GET /health check is retained.

api_key_var=self.client_config.api_key_var,
headers=self.client_config.headers,
headers_from_env=self.client_config.headers_from_env,
# Propagate backend (and RL discovery URL) so the admin client matches

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elastic ignores dynamo backend

High Severity

When rebuilding train/eval clients, elastic builds a fresh ClientConfig without copying backend. setup_clients therefore always sets renderer_transport to vllm_generate, so elastic pools configured with backend="dynamo" still speak the vLLM wire format to Dynamo inference endpoints.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a90b642. Configure here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 205787001. _rebuild_clients now passes backend=self.client_config.backend when constructing the per-URL ClientConfig, so setup_clients correctly selects dynamo_chat renderer_transport for dynamo pools instead of always using vllm_generate.

# port. Falls back to the first discovered URL if no host match.
system_urls = await asyncio.to_thread(discover_dynamo_admin_base_urls, config)
match = next((u for u in system_urls if urlsplit(u).hostname == ip), None)
config = config.model_copy(update={"admin_base_url": [match] if match else system_urls[:1]})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong worker admin fallback

Medium Severity

If RL discovery system_url hostnames do not equal the pod IP (common with DNS names), elastic pins every pod’s admin client to the first discovered worker URL. Engine admin calls such as LoRA load then hit one worker instead of the pod that serves that IP’s inference.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a90b642. Configure here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 205787001. _create_admin_client now resolves the system URL by comparing both the raw hostname and its DNS-resolved IP against the pod IP, so the match works whether system_url carries an IP address or a DNS name. Falls back to the first discovered URL only if neither comparison matches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant