Skip to content

modal-projects/training-gym

Repository files navigation

Training Gym

πŸ“– Documentation Β· API Reference

Modal Training Gym is a Python SDK for RL post-training on Modalβ€”so you don't have to hand-roll a launcher every time.

Pick a base model, a dataset, and an RL framework; the gym handles cluster topology, Ray/NCCL bring-up, volume mounts, checkpointing, and serving for eval and rollouts.

Quickstart

Install with pip:

pip install -q git+https://github.com/modal-projects/training-gym.git@main

Or pin it in pyproject.toml for uv:

training-gym = { git = "https://github.com/modal-projects/training-gym.git", branch = "main" }

Then import the building blocks from your own script:

from modal_training_gym import TrainConfig

Note

Python 3.12 is required. Modal's serialized=True functions use cloudpickle, which requires the local Python version to exactly match the remote container's. All framework images ship Python 3.12, so running from 3.11 or 3.13 will fail at app build time.

Agent set-up

This repository includes an AGENTS.md and a skills/ directory (symlinked to .claude/skills/) that teach Claude Code how to navigate the framework β€” W&B configuration, custom rollouts and generate functions, custom eval functions, and more.

Clone the repo and run claude from its root; the skills load automatically based on what you ask for.

Observability dashboard

Training Gym ships a dashboard that aggregates training runs, deployments, and eval results in one place. Deploy your own copy:

training-gym setup

Modal prints a URL where you can watch jobs in progress.

Gym Observability Dashboard

Tutorials

The fastest path through the API is the tutorials. Each one ships as a runnable .py and a paired .ipynb narrated cell-by-cell β€” the notebook is the canonical walkthrough. Each tutorial below has a one-click Launch button that opens the .ipynb in a fresh Modal Notebook; the first code cell pip-installs modal-training-gym into the notebook kernel, so the rest of the cells run as-is.

Difficulty is a rough self-assessed signal for where to start:

  • Beginner β€” single-node, introduces one framework concept.
  • Intermediate β€” 1–2 nodes, or wires up something non-default (custom reward, external script).
  • Advanced β€” β‰₯2 nodes with non-trivial parallelism (tensor-parallel, colocated RL, long context); assumes familiarity with the underlying framework.

RL

Tutorial Summary Difficulty Framework Launch
000_rl_basics Qwen3-4B haiku evaluation with verifiable rewards β€” serve, evaluate, train, compare Beginner slime Open in Modal
001_sandboxes Code RL with Harbor hello-world and sandboxed verification Intermediate slime Open in Modal
002_multiturn Multi-turn number-guessing RL with custom generate and reward functions Intermediate slime Open in Modal
003_on_policy_distillation On-policy distillation on math β€” Qwen3-8B teacher, Qwen3-4B student Intermediate slime Open in Modal
005_dapo DAPO on math with Qwen3-4B Advanced slime Open in Modal
006_audio_asr Audio GRPO on Qwen3-ASR-1.7B β€” transcribe LibriSpeech, reward βˆ’WER Intermediate slime Open in Modal

Single Node

Tutorial Summary Difficulty Framework Launch
001_qwen27b Train Qwen3.6-27B on DAPO-math with GRPO Advanced slime Open in Modal
000_qwen35b Train Qwen3.6-35B-A3B on DAPO-math with GRPO Advanced slime Open in Modal

Agents

Tutorial Summary Difficulty Framework Launch
000_agent_sandbox Build an LLM agent harness with a self-hosted model and Modal Sandbox tool execution Beginner Modal Sandbox Open in Modal

Multinode

Tutorial Summary Difficulty Framework Launch
000_kimi_k25 Kimi K2.5 LoRA GRPO training on 128 GPUs with DAPO-Math-17k Advanced miles Open in Modal
001_kimi_k26 Kimi K2.6 LoRA GRPO training on 128 GPUs with DAPO-Math-17k Advanced miles Open in Modal
002_glm_4_7 GLM-4.7 355B MoE full-weight GSPO training on 64 GPUs with DAPO-Math-17k Advanced slime Open in Modal

See tutorials/README.md for how to run the .py companions from the CLI and how to author a new tutorial.

Multi-node access

Important

Single-node training is open to everyone. Multi-node clusters β€” required for larger models β€” are still in Beta. Contact us on Slack for access.

Architecture

Architecture diagram

Documentation

Full docs are hosted at gym.modal.dev:

  • API Reference β€” every public class documented with types and defaults

Modal platform references:

License

MIT.

Releases

No releases published

Packages

 
 
 

Contributors