π Documentation Β· API Reference
Modal Training Gym is a Python SDK for RL post-training on Modalβso you don't have to hand-roll a launcher every time.
Pick a base model, a dataset, and an RL framework; the gym handles cluster topology, Ray/NCCL bring-up, volume mounts, checkpointing, and serving for eval and rollouts.
Install with pip:
pip install -q git+https://github.com/modal-projects/training-gym.git@mainOr pin it in pyproject.toml for uv:
training-gym = { git = "https://github.com/modal-projects/training-gym.git", branch = "main" }Then import the building blocks from your own script:
from modal_training_gym import TrainConfigNote
Python 3.12 is required. Modal's serialized=True functions use
cloudpickle, which requires the local Python version to exactly match the
remote container's. All framework images ship Python 3.12, so running from
3.11 or 3.13 will fail at app build time.
This repository includes an AGENTS.md and a skills/ directory (symlinked to .claude/skills/) that teach Claude Code how to navigate the framework β W&B configuration, custom rollouts and generate functions, custom eval functions, and more.
Clone the repo and run claude from its root; the skills load automatically based on what you ask for.
Training Gym ships a dashboard that aggregates training runs, deployments, and eval results in one place. Deploy your own copy:
training-gym setupModal prints a URL where you can watch jobs in progress.
The fastest path through the API is the tutorials. Each one
ships as a runnable .py and a paired .ipynb narrated cell-by-cell β
the notebook is the canonical walkthrough. Each tutorial below has a one-click
Launch button that opens the .ipynb in a fresh Modal Notebook; the first
code cell pip-installs modal-training-gym into the notebook kernel, so the
rest of the cells run as-is.
Difficulty is a rough self-assessed signal for where to start:
- Beginner β single-node, introduces one framework concept.
- Intermediate β 1β2 nodes, or wires up something non-default (custom reward, external script).
- Advanced β β₯2 nodes with non-trivial parallelism (tensor-parallel, colocated RL, long context); assumes familiarity with the underlying framework.
| Tutorial | Summary | Difficulty | Framework | Launch |
|---|---|---|---|---|
000_rl_basics |
Qwen3-4B haiku evaluation with verifiable rewards β serve, evaluate, train, compare | Beginner | slime |
|
001_sandboxes |
Code RL with Harbor hello-world and sandboxed verification | Intermediate | slime |
|
002_multiturn |
Multi-turn number-guessing RL with custom generate and reward functions | Intermediate | slime |
|
003_on_policy_distillation |
On-policy distillation on math β Qwen3-8B teacher, Qwen3-4B student | Intermediate | slime |
|
005_dapo |
DAPO on math with Qwen3-4B | Advanced | slime |
|
006_audio_asr |
Audio GRPO on Qwen3-ASR-1.7B β transcribe LibriSpeech, reward βWER | Intermediate | slime |
| Tutorial | Summary | Difficulty | Framework | Launch |
|---|---|---|---|---|
001_qwen27b |
Train Qwen3.6-27B on DAPO-math with GRPO | Advanced | slime |
|
000_qwen35b |
Train Qwen3.6-35B-A3B on DAPO-math with GRPO | Advanced | slime |
| Tutorial | Summary | Difficulty | Framework | Launch |
|---|---|---|---|---|
000_agent_sandbox |
Build an LLM agent harness with a self-hosted model and Modal Sandbox tool execution | Beginner | Modal Sandbox |
| Tutorial | Summary | Difficulty | Framework | Launch |
|---|---|---|---|---|
000_kimi_k25 |
Kimi K2.5 LoRA GRPO training on 128 GPUs with DAPO-Math-17k | Advanced | miles |
|
001_kimi_k26 |
Kimi K2.6 LoRA GRPO training on 128 GPUs with DAPO-Math-17k | Advanced | miles |
|
002_glm_4_7 |
GLM-4.7 355B MoE full-weight GSPO training on 64 GPUs with DAPO-Math-17k | Advanced | slime |
See tutorials/README.md for how to run the .py
companions from the CLI and how to author a new tutorial.
Important
Single-node training is open to everyone. Multi-node clusters β required for larger models β are still in Beta. Contact us on Slack for access.
Full docs are hosted at gym.modal.dev:
- API Reference β every public class documented with types and defaults
Modal platform references:
MIT.
