Skip to content

feat(orchestrator): EnvMixStrategy seam for env selection#2743

Open
hallerite wants to merge 1 commit into
mainfrom
feat/env-mix-strategy
Open

feat(orchestrator): EnvMixStrategy seam for env selection#2743
hallerite wants to merge 1 commit into
mainfrom
feat/env-mix-strategy

Conversation

@hallerite

@hallerite hallerite commented Jun 9, 2026

Copy link
Copy Markdown
Member

What

Extract TrainSource's weighted round-robin env selection into a swappable EnvMixStrategy seam (default WeightedRoundRobin). Example selection — the per-env reshuffling cursor — stays in TrainSource.

This is slice (b) of the composable algorithm abstraction: cleanly separate which env (global EnvMixStrategy) from which example (per-env, slice c builds on this).

Changes

  • orchestrator/sampling.py (new): EnvMixStrategy ABC + WeightedRoundRobin default. pick() returns the next env name via weighted random choice.
  • TrainSource delegates the env pick to self.env_mix.pick(); it still owns dataset loading, the per-env cursor, reshuffle-on-exhaustion, and env_costs.

Behavior

Behavior-preserving. WeightedRoundRobin draws from TrainSource's existing RNG (injected), so env selection stays in the same stream as the dataset shuffles — the example sequence is identical to before. (Partitioning RNG per-env is a slice-(c) change, not here.) The public API (TrainSource(train_envs, seed) + next_example) is unchanged.

Testing

  • ruff check + ruff format --check clean.
  • tests/unit/orchestrator/test_sampling.py (new): WeightedRoundRobin determinism per seed, weight respecting (incl. zero-weight never picked), empty-envs guard.
  • tests/unit/test_configs.py (106) pass; imports resolve.

🤖 Generated with Claude Code


Note

Low Risk
Refactor with default implementation matching prior logic; TrainSource API and dispatcher wiring unchanged.

Overview
Introduces a swappable env mix seam for training rollouts: global “which env next?” is no longer inlined in TrainSource.

New orchestrator/sampling.py defines EnvMixStrategy and default WeightedRoundRobin, which picks an env name via weighted random choice (same weight rules as before: per-env ratio when all set, else dataset size). TrainSource now calls self.env_mix.pick() instead of rng.choices directly; it still owns datasets, per-env cursors, reshuffle-on-exhaustion, permit costs, and next_example’s public contract.

Behavior-preserving: WeightedRoundRobin uses TrainSource’s existing random.Random instance so env picks stay in the same RNG stream as dataset shuffles.

Tests: tests/unit/orchestrator/test_sampling.py covers determinism, weight behavior (including zero weight), and empty-env validation.

Reviewed by Cursor Bugbot for commit 84370ab. Bugbot is set up for automated code reviews on this repo. Configure here.

Extract TrainSource's weighted round-robin env pick into a swappable
EnvMixStrategy (default WeightedRoundRobin). Example selection (the
reshuffling cursor) stays in TrainSource. The strategy draws from
TrainSource's RNG, so the example sequence is unchanged — pure
extraction, no behavior delta. Separates 'which env' from 'which
example' as the seam slice (c) builds on.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hallerite hallerite marked this pull request as ready for review June 9, 2026 17:58
@hallerite hallerite requested a review from mikasenghaas June 9, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant