Training Gym

Modal Training Gym is a Python SDK for RL post-training on Modal—so you don't have to hand-roll a launcher every time.

Pick a base model, a dataset, and an RL framework; the gym handles cluster topology, Ray/NCCL bring-up, volume mounts, checkpointing, and serving for eval and rollouts.

Quickstart

Install with pip:

pip install -q git+https://github.com/modal-projects/training-gym.git@main

Or pin it in pyproject.toml for uv:

training-gym = { git = "https://github.com/modal-projects/training-gym.git", branch = "main" }

Then import the building blocks from your own script:

from modal_training_gym import TrainConfig

Note

Python 3.12 is required. Modal's serialized=True functions use cloudpickle, which requires the local Python version to exactly match the remote container's. All framework images ship Python 3.12, so running from 3.11 or 3.13 will fail at app build time.

Agent set-up

This repository includes an AGENTS.md and a skills/ directory (symlinked to .claude/skills/) that teach Claude Code how to navigate the framework — W&B configuration, custom rollouts and generate functions, custom eval functions, and more.

Clone the repo and run claude from its root; the skills load automatically based on what you ask for.

Observability dashboard

Training Gym ships a dashboard that aggregates training runs, deployments, and eval results in one place. Deploy your own copy:

training-gym setup

Modal prints a URL where you can watch jobs in progress.

Tutorials

The fastest path through the API is the tutorials. Each one ships as a runnable .py and a paired .ipynb narrated cell-by-cell — the notebook is the canonical walkthrough. Each tutorial below has a one-click Launch button that opens the .ipynb in a fresh Modal Notebook; the first code cell pip-installs modal-training-gym into the notebook kernel, so the rest of the cells run as-is.

Difficulty is a rough self-assessed signal for where to start:

Beginner — single-node, introduces one framework concept.
Intermediate — 1–2 nodes, or wires up something non-default (custom reward, external script).
Advanced — ≥2 nodes with non-trivial parallelism (tensor-parallel, colocated RL, long context); assumes familiarity with the underlying framework.

RL

Tutorial	Summary	Difficulty	Framework
`000_rl_basics`	Qwen3-4B haiku evaluation with verifiable rewards — serve, evaluate, train, compare	Beginner	`slime`
`001_sandboxes`	Code RL with Harbor hello-world and sandboxed verification	Intermediate	`slime`
`002_multiturn`	Multi-turn number-guessing RL with custom generate and reward functions	Intermediate	`slime`
`003_on_policy_distillation`	On-policy distillation on math — Qwen3-8B teacher, Qwen3-4B student	Intermediate	`slime`
`005_dapo`	DAPO on math with Qwen3-4B	Advanced	`slime`
`006_audio_asr`	Audio GRPO on Qwen3-ASR-1.7B — transcribe LibriSpeech, reward −WER	Intermediate	`slime`

Single Node

Tutorial	Summary	Difficulty	Framework	Launch
`001_qwen27b`	Train Qwen3.6-27B on DAPO-math with GRPO	Advanced	`slime`
`000_qwen35b`	Train Qwen3.6-35B-A3B on DAPO-math with GRPO	Advanced	`slime`

Agents

Tutorial	Summary	Difficulty	Framework	Launch
`000_agent_sandbox`	Build an LLM agent harness with a self-hosted model and Modal Sandbox tool execution	Beginner	Modal Sandbox

Multinode

Tutorial	Summary	Difficulty	Framework
`000_kimi_k25`	Kimi K2.5 LoRA GRPO training on 128 GPUs with DAPO-Math-17k	Advanced	`miles`
`001_kimi_k26`	Kimi K2.6 LoRA GRPO training on 128 GPUs with DAPO-Math-17k	Advanced	`miles`
`002_glm_4_7`	GLM-4.7 355B MoE full-weight GSPO training on 64 GPUs with DAPO-Math-17k	Advanced	`slime`

See tutorials/README.md for how to run the .py companions from the CLI and how to author a new tutorial.

Multi-node access

Important

Single-node training is open to everyone. Multi-node clusters — required for larger models — are still in Beta. Contact us on Slack for access.

Architecture

Documentation

Full docs are hosted at gym.modal.dev:

API Reference — every public class documented with types and defaults

Modal platform references:

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github		.github
assets		assets
dashboards		dashboards
docs-next		docs-next
modal_training_gym		modal_training_gym
scripts		scripts
skills		skills
tests		tests
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
STYLE.md		STYLE.md
pyproject.toml		pyproject.toml
slime.md		slime.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training Gym

Quickstart

Agent set-up

Observability dashboard

Tutorials

RL

Single Node

Agents

Multinode

Multi-node access

Architecture

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Training Gym

Quickstart

Agent set-up

Observability dashboard

Tutorials

RL

Single Node

Agents

Multinode

Multi-node access

Architecture

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages