feat(echo): per-role echo weights + user-supplied token filter by hallerite · Pull Request #2782 · PrimeIntellect-ai/prime-rl

hallerite · 2026-06-11T23:50:00Z

Stacked on #2746. Generalizes ECHO's token selection, porting the selection vocabulary from @snimu's #2677 onto the component weight streams.

What changes

Per-role echo weights. The observations = "tool" | "all" binary becomes a role table — each env-provided message role trains at its own α, selected via the renderer's per-token attribution (message_indices / message_roles / is_content):

[orchestrator.algo.advantage]
type = "echo"

[orchestrator.algo.advantage.roles.tool]
alpha = 0.25

[orchestrator.algo.advantage.roles.user]
alpha = 0.05

The echo preset is unchanged in meaning: tool-response bodies at alpha = 0.1. Presets stay atomic; setting any role replaces the whole table.

User-supplied token filter. An optional hook narrows the role selection per rollout — e.g. dropping warning lines from tool output, or tokens the sampler found unlikely. Config matches the custom advantage/loss precedent (import_path + kwargs); the signature is #2677's:

[orchestrator.algo.advantage.filter]
import_path = "my_module.drop_warnings"
kwargs = { patterns = ["WARNING"] }

def drop_warnings(rollout, *, patterns: list[str]) -> list[list[bool]]:
    # one keep-mask per trajectory step, spanning prompt_ids + completion_ids
    ...

The callable sees the raw rollout (message text, sampling logprobs), so content filters and sampling-probability filters need no further framework surface. Shape violations fail loudly. The filter can only narrow the selection, never widen it.

How it works

completion_obs_mask (bool, orchestrator-internal) becomes completion_obs_weights (float): each selected token carries its role's α from interleave_rollout, and stamp_loss_routing folds the weights into the ce_weights stream directly — the scalar observation_weight parameter is gone. Trainer untouched: per-component global normalization already keeps echo tokens out of the rl denominator, and a zero weight excludes a token from the ce component's numerator and denominator (true masking, not dilution).

Breaking (pre-merge, no compat shims)

advantage.observation_weight / advantage.observations are replaced by advantage.roles.<role>.alpha.
Echo now always requires orchestrator.renderer (role selection needs attribution). The blanket "all" mode is gone — assemble the roles you want instead.

Not in this PR

tool_names filtering (restrict echo to specific tool functions) — deferred; non-breaking optional field on the tool role later, message_tool_names already rides in the attribution.
Live-θ probability filters ("mask tokens with low prob under the current policy") — structurally trainer-side (θ doesn't exist at stamping); separate small PR if needed.

🤖 Generated with Claude Code

Note

Medium Risk
Breaking ECHO config and loss-routing wire fields change which env tokens get CE supervision; misconfigured roles or filters could silently drop observation training, though validation enforces renderer and filter shapes.

Overview
ECHO observation supervision is generalized from a single observation_weight + observations = "tool" | "all" switch to a per-message-role table (roles.{system,user,assistant,tool}.alpha) and an optional user filter (import_path + kwargs).

The echo preset still means tool-response bodies at α = 0.1; defining any role replaces the entire role table. Orchestrator-internal tagging moves from boolean completion_obs_mask to float completion_obs_weights (per-token α), folded into ce_weights in stamp_loss_routing without a global observation scalar. interleave_rollout selects tokens via renderer attribution and can narrow them with validated per-step keep-masks. Echo always requires orchestrator.renderer (no "all" escape hatch). Docs, debug echo.toml, and config skill text are updated for the breaking config shape.

^{Reviewed by Cursor Bugbot for commit 509d6a3. Bugbot is set up for automated code reviews on this repo. Configure here.}

Echo's selection surface generalizes from the observations="tool"|"all" binary to a role table: each env-provided message role (system / user / assistant / tool) trains at its own alpha, selected via the renderer's per-token attribution. An optional filter hook (import_path + kwargs, matching the custom advantage/loss precedent) narrows the selection per rollout with one keep-mask per trajectory step. - completion_obs_mask (bool) -> completion_obs_weights (float): the per-token weight carries its role's alpha, so stamping folds it into ce_weights directly and stamp_loss_routing drops the scalar observation_weight parameter. Orchestrator-internal as before. - The echo preset is unchanged in meaning: tool-response bodies at 0.1. Setting any role replaces the whole table. - Echo now always requires the renderer (role selection needs attribution); the blanket "all" mode is gone — assemble the roles you want instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

hallerite · 2026-06-12T00:19:15Z

Folded into #2746 by merging this branch into its base (same flow as #2764) — GitHub auto-marked it merged. The echo selection surface ships with the main algorithm-abstraction PR.

hallerite marked this pull request as ready for review June 11, 2026 23:52

hallerite merged commit e55f067 into feat/algorithm-abstraction Jun 12, 2026
18 of 19 checks passed

hallerite mentioned this pull request Jun 12, 2026

feat: algorithm abstraction — named algorithm classes + inline frozen-model references (grpo, opd, sft_distill, self_distill, echo) #2746

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(echo): per-role echo weights + user-supplied token filter#2782

feat(echo): per-role echo weights + user-supplied token filter#2782
hallerite merged 1 commit into
feat/algorithm-abstractionfrom
feat/echo-selection

hallerite commented Jun 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

hallerite commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes

How it works

Breaking (pre-merge, no compat shims)

Not in this PR

Uh oh!

Uh oh!

hallerite commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallerite commented Jun 11, 2026 •

edited by cursor Bot

Loading