feat: consume native multimodal from the v1 trace (v0 + v1 VLM training) by mikasenghaas · Pull Request #2751 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-06-10T02:39:57Z

Summary

The orchestrator/trainer-feeding side of native v1 multimodal, plus the v1 example wiring (pairs with verifiers #1601).

mm consumer. trace_to_samples turns each turn's TurnTokens.multi_modal_data (vf.MMData) into mm_kwargs — concatenating each HF-processor kwarg (pixel_values, image_grid_thw) over the sample's images — and builds mm_token_type_ids from the renderer's placeholder→type map.
Per-turn delta. A native v1 turn re-renders the whole prompt (cumulative multi_modal_data) while the v0 bridge ships deltas; _pack_mm_kwargs contributes each image once, aligned to the placeholder tokens (which appear once, in the turn the image is introduced). Repeated images (e.g. two squares of the same color) keep distinct slots — matched by position, not deduped by hash.
Type-id map. mm_token_type_id_map (in orchestrator/utils.py) derives the map transiently from the renderer config (the orchestrator keeps no renderer; the old self.renderer hook was dead). Gated on model.is_vlm, so text runs pay nothing.
v1 example. Registers color-codeword-v1 (pyproject) + configs/debug/v1/multimodal.toml (the v1 port of the multimodal debug config). Bumps deps/verifiers to the mm-enabled bridge + taskset.

Depends on

verifiers #1601 (native MMData types, renderer/bridge emission, image-input message types, the color-codeword-v1 taskset). The submodule is pinned to that branch tip; it'll re-pin to the merged feat/nano-as-v1 commit once #1601 lands.

Verification

v1 multimodal (color-codeword-v1, Qwen3-VL-4B, native): rollouts reward ~0.83 (the VLM reads the squares), Training finished!, both trainer steps complete the M-RoPE path.
v0 multimodal (color-codeword via bridge): re-verified through the shared delta-aware packing — trains cleanly (no regression).
Unit-tested the renderer→MMData→mm_kwargs round-trip (dtype/shape/values) and the cumulative-vs-delta per-turn handling incl. repeated colors.

trace_to_samples now unions each turn's TurnTokens.multi_modal_data into mm_kwargs (pixel_values/image_grid_thw EncodedTensors) and builds mm_token_type_ids from the renderer's placeholder->type map. The map is derived transiently from the renderer config (mm_token_type_id_map in utils) since the orchestrator keeps no renderer. Bumps verifiers to the mm-enabled bridge (depends on verifiers #1601).

…mm packing Registers the color-codeword-v1 taskset + a v1 debug config (configs/debug/v1/multimodal.toml), and makes _pack_mm_kwargs take each turn's *new* images: a v1 turn re-renders the whole prompt (cumulative multi_modal_data) while the v0 bridge ships deltas — both resolve to one slot per image, aligned to the placeholder tokens (repeated colors kept by position). Bumps verifiers to the input-side + taskset (depends on #1601).

mikasenghaas added 2 commits June 10, 2026 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: consume native multimodal from the v1 trace (v0 + v1 VLM training)#2751

feat: consume native multimodal from the v1 trace (v0 + v1 VLM training)#2751
mikasenghaas wants to merge 2 commits into
feat/nano-as-v1from
fix/v0-multimodal

mikasenghaas commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Depends on

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikasenghaas commented Jun 10, 2026 •

edited

Loading