feat: dynamo inference backend integration by biswapanda · Pull Request #2737 · PrimeIntellect-ai/prime-rl

biswapanda · 2026-06-09T00:31:58Z

Description

replaced by - #2773

End-to-end support for running prime-rl RL training against NVIDIA Dynamo (GB200/GB300) alongside the existing vLLM path. Adds a Dynamo inference backend, NCCL/filesystem weight transfer for GB200, vLLM 0.22 patches, MoE routed-experts capture + replay, and the deploy tooling (image, helm, k8s manifests) to run it.

Highlights

Dynamo backend: AdminAPI abstraction + backend selector (client.backend = vllm | dynamo) + RL worker discovery (GET /v1/rl/workers) for Dynamo-served inference.
Weight transfer: NCCL broadcast + FP8/E8M0 conversion for GB200 (qwen3_moe / glm_moe), plus an NFS-safe filesystem broadcast path with weight_broadcast.keep_recent.
routed_experts (MoE expert replay): the orchestrator decodes the {data, shape, start, dtype} payload dtype-aware (uint8/uint16, normalizing uint16→int32 for the trainer; int32 fallback for >65535 experts), and the trainer replays the captured routing so recomputed logprobs match inference. Inference forwards moe_backend and auto-selects triton when router replay is enabled — the default FlashInfer fused MoE kernel bypasses the capture hook (→ all-zero routing), so a non-fused backend is required.
Orchestrator: dispatch compute_teacher_logprobs by renderer_transport (vLLM generate vs Dynamo nvext TITO); stop sending return_token_ids for Dynamo compatibility.
Inference: vLLM 0.22 patches — fp32 lm-head, int64 silu_mul_quant, padded scrub.
Deploy: Dockerfile.cuda.runtime (vLLM 0.22, DeepGEMM) + Dockerfile.dynamo, helm chart updates, Dynamo k8s manifests (client example sets backend=dynamo), and tools/dynamo run/smoke scripts.

Type of Change

New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (head 1b5917a). The 2 remaining review threads are non-routed_experts production-path follow-ups, flagged with fixes: weight-update pause retries, and broadcast keep_recent should be ≥ orchestrator.max_off_policy_steps.

Validation

3-GPU GB200 (1 inference + 2 FSDP trainer), Qwen3-30B-A3B-Thinking, router replay + moe_backend=triton: 10-step RL run with Mismatch KL 0.0002–0.0005 every step (faithful routing replay, no drift), no errors/OOM, stable memory.

Notes

Companion to PrimeIntellect-ai/verifiers#1574 and PrimeIntellect-ai/renderers#79 (the dynamo_chat TITO transport this orchestrator path drives). The deps commit repoints the verifiers/renderers submodules at biswapanda forks pending those PRs merging.

Note

High Risk
Touches NCCL weight broadcast, inference weight reload (E8M0/FP8), orchestrator–inference admin contracts, and large vLLM runtime patches; misconfiguration can break training sync or serving on GPU clusters.

Overview
Adds NVIDIA Dynamo as an alternate inference backend (client.backend: vllm | dynamo) via an AdminAPI abstraction (VLLMAdminAPI vs DynamoAdminAPI on /engine/*), RL worker discovery (GET /v1/rl/workers, rl_base_url), and renderer_transport=dynamo_chat for nvext rollouts. Orchestrator stops defaulting return_token_ids for Dynamo; teacher logprobs dispatch on transport (vLLM generate vs Dynamo chat/nvext).

Weight sync & GB200: Filesystem broadcast gains configurable keep_recent, fsync-before-STABLE, and retention-aware cleanup; NCCL broadcast adds per-layer dist.barrier + CUDA sync. Inference reload handles DeepGEMM E8M0 scale layout; Qwen3 MoE can export vLLM kernel/FP8 weights. vLLM patches add int64 DeepGEMM SiLU/mul quant, fp32 lm-head idempotency, and dtype-aware routed_experts capture/replay (moe_backend, auto triton when router replay is on).

Deploy: New Dockerfile.cuda.runtime (cuda-dl-base devel for NVRTC/tilelang + python3.12-dev), Dynamo k8s examples (DGD, ConfigMap, Helm values with inference disabled), Helm chart extensions (ConfigMap mounts, existingClaim, DRA resource claims, tolerations/pull secrets), and tools/dynamo launch/smoke scripts.

^{Reviewed by Cursor Bugbot for commit 08bb4ea. Bugbot is set up for automated code reviews on this repo. Configure here.}

…ker discovery

… (qwen3_moe/glm_moe)

…p_recent for LoRA

…nsport (vllm + dynamo nvext)

…t, padded scrub)

…k-4 forks (TITO)

…) + Dockerfile.dynamo

…tools

…t; bump verifiers/renderers deps to rl-sdk-4 heads

…s real routing

…t16 to int32, auto-triton on router replay

…consistent

…ceClaim enabled

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2c61937. Configure here.}

…xperts tips

… deps changes from PR

biswapanda changed the title ~~feat: Dynamo (GB200) inference backend + weight transfer + deploy tooling~~ feat: Dynamo inference backend integration Jun 9, 2026

biswapanda changed the title ~~feat: Dynamo inference backend integration~~ feat: dynamo inference backend integration Jun 9, 2026

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread src/prime_rl/inference/vllm/worker/nccl.py

Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py

Comment thread k8s/prime-rl/templates/deployment.yaml

Comment thread src/prime_rl/utils/client.py

biswapanda added 8 commits June 8, 2026 19:12

feat(dynamo): AdminAPI abstraction + dynamo backend selector + RL wor…

15d29ee

…ker discovery

feat(weight-transfer): NCCL broadcast + FP8/E8M0 conversion for GB200…

a1e0c5f

… (qwen3_moe/glm_moe)

feat(broadcast): NFS-safe filesystem broadcast + weight_broadcast.kee…

685c13a

…p_recent for LoRA

feat(orchestrator): dispatch compute_teacher_logprobs by renderer_tra…

4f5ba0b

…nsport (vllm + dynamo nvext)

feat(inference): vLLM 0.22 patches (fp32 lm-head, int64 silu_mul_quan…

7e4bc48

…t, padded scrub)

build(deps): point verifiers/renderers submodules at biswapanda rl-sd…

9f2d880

…k-4 forks (TITO)

build(image): Dockerfile.cuda.runtime (vLLM 0.22, COPY deps, DeepGEMM…

a03a559

…) + Dockerfile.dynamo

feat(deploy): helm chart updates + dynamo k8s manifests + smoke-test …

7643298

…tools

biswapanda force-pushed the rl-sdk-4 branch from 7b8acfd to 7643298 Compare June 9, 2026 02:13

fix(rl): rename renderer_transport values to vllm_generate/dynamo_cha…

49a4d1f

…t; bump verifiers/renderers deps to rl-sdk-4 heads

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread src/prime_rl/utils/client.py

biswapanda added 3 commits June 10, 2026 02:33

fix(routed_experts): carry dtype for models with over 256 experts

e75f343

feat(inference): forward moe_backend to vLLM so router-replay capture…

cbc2792

…s real routing

fix(routed_experts): int32 fallback for >65535 experts, normalize uin…

788281c

…t16 to int32, auto-triton on router replay

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/rl/train.py

biswapanda added 2 commits June 10, 2026 11:25

fix(routed_experts): preserve per-model dtype so batch packing stays …

13e411d

…consistent

fix(k8s): drop trainer nvidia.com/gpu request when trainer DRA resour…

2c61937

…ceClaim enabled

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread k8s/dynamo-deploy/prime-rl-configs.yaml

fix(k8s): set backend=dynamo in the dynamo-deploy client example

1b5917a

biswapanda mentioned this pull request Jun 10, 2026

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) PrimeIntellect-ai/verifiers#1574

Open

1 task

biswapanda added 4 commits June 11, 2026 02:34

chore(deps): bump verifiers/renderers submodules to rl-sdk-4 routed_e…

6f42560

…xperts tips

chore(deps): restore submodules and .gitmodules to upstream, dropping…

f8f42f8

… deps changes from PR

rm extra files

8c837e6

rm unnecessary files

08bb4ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: dynamo inference backend integration#2737

feat: dynamo inference backend integration#2737
biswapanda wants to merge 19 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

biswapanda commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biswapanda commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Highlights

Type of Change

Review

Validation

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

biswapanda commented Jun 9, 2026 •

edited

Loading