feat(client): add Dynamo inference backend#2773
Conversation
…mo admin to worker system URL
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a90b642. Configure here.
|
|
||
| if self.base_model_name not in models: | ||
| models = await self._admin_api.list_models(admin_client) | ||
| if self.base_model_name not in [m.get("id") for m in models]: |
There was a problem hiding this comment.
Elastic model checks wrong host
High Severity
With backend="dynamo", elastic admin clients target the worker system server, but health and LoRA sync still call /v1/models on that client. Static pools use separate inference URLs for model listing; elastic does not, so servers may never become ready and adapter state can stay wrong.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit a90b642. Configure here.
There was a problem hiding this comment.
Fixed in 205787001. _check_server_health now skips the /v1/models check when backend="dynamo" — the admin client targets the worker system server (DYN_SYSTEM_PORT) which doesn't serve that route; model availability is implied by a healthy worker registered via Dynamo's hub. The health GET /health check is retained.
| api_key_var=self.client_config.api_key_var, | ||
| headers=self.client_config.headers, | ||
| headers_from_env=self.client_config.headers_from_env, | ||
| # Propagate backend (and RL discovery URL) so the admin client matches |
There was a problem hiding this comment.
Elastic ignores dynamo backend
High Severity
When rebuilding train/eval clients, elastic builds a fresh ClientConfig without copying backend. setup_clients therefore always sets renderer_transport to vllm_generate, so elastic pools configured with backend="dynamo" still speak the vLLM wire format to Dynamo inference endpoints.
Reviewed by Cursor Bugbot for commit a90b642. Configure here.
There was a problem hiding this comment.
Fixed in 205787001. _rebuild_clients now passes backend=self.client_config.backend when constructing the per-URL ClientConfig, so setup_clients correctly selects dynamo_chat renderer_transport for dynamo pools instead of always using vllm_generate.
| # port. Falls back to the first discovered URL if no host match. | ||
| system_urls = await asyncio.to_thread(discover_dynamo_admin_base_urls, config) | ||
| match = next((u for u in system_urls if urlsplit(u).hostname == ip), None) | ||
| config = config.model_copy(update={"admin_base_url": [match] if match else system_urls[:1]}) |
There was a problem hiding this comment.
Wrong worker admin fallback
Medium Severity
If RL discovery system_url hostnames do not equal the pod IP (common with DNS names), elastic pins every pod’s admin client to the first discovered worker URL. Engine admin calls such as LoRA load then hit one worker instead of the pod that serves that IP’s inference.
Reviewed by Cursor Bugbot for commit a90b642. Configure here.
There was a problem hiding this comment.
Fixed in 205787001. _create_admin_client now resolves the system URL by comparing both the raw hostname and its DNS-resolved IP against the pod IP, so the match works whether system_url carries an IP address or a DNS name. Falls back to the first discovered URL only if neither comparison matches.


Overview:
Adds NVIDIA Dynamo as an optional inference backend alongside the existing vLLM path. Controlled by a new
ClientConfig.backendfield ("vllm"|"dynamo"). Three self-contained changes: a pluggableAdminAPIabstraction,renderer_transportselection for the verifiers wire shape, and a Dynamo teacher-logprobs path for OPD training.Details:
packages/prime-rl-configs/src/prime_rl/configs/shared.pyClientConfig.backend: Literal["vllm", "dynamo"]— selects theAdminAPIimplementation and verifiers wire shape. Default"vllm"is a no-op for existing configs.ClientConfig.rl_base_url— optional override for the Dynamo RL worker discovery listener (GET /v1/rl/workers). When unset, the port is derived fromDYN_RL_PORT(default 8001).src/prime_rl/utils/client.pyAdminAPIProtocol +VLLMAdminAPI— extracts the existing vLLM admin paths (/pause,/resume,/update_weights,/load_lora_adapter,/init_broadcaster) into a typed protocol.VLLMAdminAPImethods go through a shared_admin_posthelper that adds bounded per-attempt timeouts and tenacity retry on 5xx/transport errors (300 s for pause/resume, 720 s for weight updates).DynamoAdminAPI— Dynamo worker admin overPOST /engine/<method>:pause_generation,resume_generation,update_weights_from_disk/update_weights_from_distributed(filesystem vs NCCL paths),load_lora_adapter. Inherits health/model checks fromVLLMAdminAPI.setup_admin_api(client_config)— picksDynamoAdminAPIwhenbackend="dynamo",VLLMAdminAPIotherwise.discover_dynamo_admin_base_urls— resolves worker system URLs fromGET /v1/rl/workers; falls back to port-replacedbase_urlwhenrl_base_urlis unset.setup_clients— setsrenderer_transport="dynamo_chat"on allvf.ClientConfigobjects whenbackend="dynamo","vllm_generate"otherwise. Requires verifiers #1574 + renderers #79.src/prime_rl/orchestrator/utils.pycompute_teacher_logprobsinto two paths dispatched onclient_config.renderer_transport:_compute_teacher_logprobs_vllm(existing/inference/v1/generatepath) and_compute_teacher_logprobs_dynamo(POST/v1/chat/completionswithnvext.token_data+nvext.extra_fields=["prompt_logprobs"])._flatten_prompt_logprobs— shared flattener that handles both vLLM typedLogprobobjects and Dynamos dict shape{logprob, rank?, decoded_token?}.Where should the reviewer start?
src/prime_rl/utils/client.py—AdminAPIprotocol (line ~32),DynamoAdminAPIclass,setup_admin_api, andsetup_clientsrenderer_transport selection. Core of the change.src/prime_rl/orchestrator/utils.py—_compute_teacher_logprobs_dynamoand thecompute_teacher_logprobsdispatcher. Note the placeholdermessagesfield required by the Dynamo frontend even whennvext.token_datais set.packages/prime-rl-configs/src/prime_rl/configs/shared.py— the two newClientConfigfields; verify defaults are backward-compatible.Related Issues:
renderer_transportfield tovf.ClientConfigdynamo_chattransport torenderers.generate()Note
High Risk
Touches weight broadcast (NCCL vs disk), pause/resume around every update, and discovery-dependent admin URLs—misconfiguration can break training or leave engines paused.
Overview
Adds NVIDIA Dynamo as an optional inference backend via
ClientConfig.backend(vllmdefault,dynamo) and optionalrl_base_urlfor RL worker discovery.Admin layer: vLLM admin HTTP calls are refactored behind an
AdminAPIprotocol (VLLMAdminAPIvsDynamoAdminAPIonPOST /engine/*). Health, model listing, pause/resume, weight updates, LoRA load, and NCCL init all route through this abstraction; the orchestrator passesweight_broadcast.typeinto Dynamo for filesystem vs NCCL weight paths.Client wiring: For
backend=dynamo, rollouts userenderer_transport=dynamo_chat(nvext token wire); admin URLs come fromGET /v1/rl/workerswhenadmin_base_urlis unset. Model checks can use OpenAI-compatbase_urlwhile admin hits system ports. Elastic pools propagate backend and pin per-pod Dynamo system URLs.OPD / teacher logprobs:
compute_teacher_logprobsdispatches onrenderer_transport—existing vLLM/inference/v1/generatevs Dynamo chat completions withnvext.token_dataand shared logprob flattening.Reviewed by Cursor Bugbot for commit a90b642. Bugbot is set up for automated code reviews on this repo. Configure here.