feat: add RL checkpoint format backward compat integration test#2776
feat: add RL checkpoint format backward compat integration test#2776samsja wants to merge 2 commits into
Conversation
Generates a checkpoint with a pinned version of prime-rl (via git worktree) then verifies the current version can resume RL training from it. Set CKPT_FORMAT_REF to a git tag/commit to enable cross-version testing. Defaults to current HEAD (same-version round-trip).
| src_configs = repo_dir / "configs" / "ci" / "integration" / "ckpt_compat" | ||
| dst_configs = worktree / "configs" / "ci" / "integration" / "ckpt_compat" | ||
| dst_configs.parent.mkdir(parents=True, exist_ok=True) | ||
| subprocess.check_call(["cp", "-r", str(src_configs), str(dst_configs)]) |
There was a problem hiding this comment.
Config copy nests existing directory
Medium Severity
When CKPT_FORMAT_REF points at a commit that already contains configs/ci/integration/ckpt_compat, cp -r into an existing destination creates a nested ckpt_compat/ckpt_compat tree. The RL run then uses the checked-out configs, not the copies from the current branch the test intends to inject.
Reviewed by Cursor Bugbot for commit 55beff1. Configure here.
| process.wait(timeout=30) | ||
| except subprocess.TimeoutExpired: | ||
| process.kill() | ||
| process.wait() |
There was a problem hiding this comment.
Timeout skips child process cleanup
Medium Severity
On subprocess timeout, _run_rl calls terminate/kill on the top-level uv process only. Other integration tests use cleanup_process, which recursively signals torchrun, inference, and other descendants started under uv run rl.
Reviewed by Cursor Bugbot for commit 55beff1. Configure here.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit acf930d. Configure here.
| dst_configs.parent.mkdir(parents=True, exist_ok=True) | ||
| subprocess.check_call(["cp", "-r", str(src_configs), str(dst_configs)]) | ||
|
|
||
| yield worktree |
There was a problem hiding this comment.
Worktree missing submodule checkout
Medium Severity
When CKPT_FORMAT_REF points at an older commit, the test runs uv run rl from a new git worktree but never initializes deps/* submodules there. pyproject.toml depends on those paths, so checkpoint generation in the pinned version fails even though CI already initialized submodules only in the main checkout.
Reviewed by Cursor Bugbot for commit acf930d. Configure here.


Generates a checkpoint with a pinned version of prime-rl (via git worktree) then verifies the current version can resume RL training from it.
Set CKPT_FORMAT_REF to a git tag/commit to enable cross-version testing. Defaults to current HEAD (same-version round-trip).
Note
Low Risk
Test and CI config only; no changes to training, checkpoint serialization, or runtime behavior in production code.
Overview
Adds a GPU integration test that exercises RL checkpoint write → resume so breaking checkpoint-format changes are caught in CI.
The flow runs
start.toml(saves at step 3) thenresume.toml(resume_step = 3) on the sameoutput_dir, with the usual no-error and reward checks on both runs. Optional cross-version coverage: setCKPT_FORMAT_REFto a tag/commit and the generator run uses a detached git worktree at that ref (configs copied in); resume always runs from the current checkout.gpu_tests.yamladds a matrix job fortests/integration/test_ckpt_compat.pyon thevmrunner.Reviewed by Cursor Bugbot for commit acf930d. Bugbot is set up for automated code reviews on this repo. Configure here.