fix(v1,serve): evict rollout trajectories, discard delivered intercepts, trim worker arenas by hubert-marek · Pull Request #1611 · PrimeIntellect-ai/verifiers

hubert-marek · 2026-06-10T22:06:32Z

Three env-worker memory fixes for long multi-turn rollouts, ports to main of fixes production-validated on feat/ephemeral-mm-pixels (#1608, #1610) against worldsims browser-env training runs.

1. Evict completed rollouts from `Runtime.trajectories` (leak)

prepare_state → resolve_trajectory → register_trajectory stores every rollout's live trajectory list in Runtime.trajectories so handle-borrowing sub-runtime states (e.g. in-sandbox program model calls resolving the owning rollout's trajectory) can look it up — but nothing ever removed the entry, and the Runtime lives for the env-worker process lifetime. Every completed rollout's full trajectory (per-step prompt copies + full-prefix token arrays) stayed referenced forever: ~50–400MB leaked per rollout on long browser envs, OOM-killing env pods within a handful of training steps. Now popped in cleanup_rollout alongside the existing per-trajectory dict evictions (_model_request_locks, _inflight_visible_model_requests); cleanup_rollout runs in the harness finally, so error paths evict too. Borrowers' lifetimes are bounded by the owning rollout, so nothing can resolve the entry after cleanup. (list doesn't support weakrefs, so the WeakValueDictionary pattern used by the runtime registry wasn't available here — manual eviction is required.)

2. Discard delivered intercepts (dominant term: −74% worker peak RSS)

InterceptionServer.intercepts retained every request's raw body — the full message history, including base64 images on multimodal envs — until rollout unregister, so a long rollout held every turn's request simultaneously (renderer-side image offload rewrites a normalized copy, so the intercept's base64 never left). forward_request now discards the intercept after delivery via Endpoint.discard_request: the HTTP handler keeps its own local reference so delivery is unaffected, discard is idempotent, and rollout-unregister still sweeps undelivered entries (matching the manual per-request pop the opencode envs already do).

3. `malloc_trim(0)` at the worker stats cadence (resting RSS ÷3)

Env workers never returned freed arena pages to the OS, so RSS ratcheted to ~3× the live set under rollout churn. Trim every 10s in the existing stats loop via ctypes (the call releases the GIL — it cannot stall the event loop). gc.collect is deliberately avoided here: full collections on large heaps block past the worker heartbeat timeout and get healthy workers killed.

Measurement

Churn simulation (4 concurrent 45-turn browser rollouts per worker, real subprocess RSS, glibc/pymalloc):

scenario	worker peak	worker resting
baseline	1053 MB	943–988 MB
+ intercept discard	275 MB	179–218 MB
+ worker trim	275 MB	119–155 MB

Production lineage (worldsims env pin): without fix 1 the env pod OOMed at training step ~5 (~115GB); with fix 1 alone it climbed to ~90GB and still failed; with all of the above the pod holds flat. py_compile + ruff check clean; discard behavior unit-checked (local reference intact post-discard; idempotent).

🤖 Generated with Claude Code

Note

Medium Risk
Changes rollout cleanup and endpoint request lifecycle timing; low logic risk but affects memory-sensitive training paths where regressions would show as missing trajectories or failed intercept lookups if cleanup ran too early (mitigated by discarding only after delivery).

Overview
Addresses env-worker memory growth during long, concurrent multi-turn rollouts with three targeted changes.

Runtime: cleanup_rollout now pops the rollout’s entry from Runtime.trajectories, alongside existing per-trajectory lock/inflight cleanup, so completed rollouts no longer keep full trajectory lists alive for the worker process lifetime.

Endpoint interception: Adds Endpoint.discard_request and calls it from forward_request after stream/non-stream delivery, so InterceptionServer.intercepts no longer retains raw request bodies (e.g. multimodal base64) for every turn until rollout unregister.

Env worker: The 10s stats loop optionally invokes malloc_trim(0) via libc.so.6 (skipped if unavailable) to return freed glibc arena pages and reduce resting RSS under rollout churn.

^{Reviewed by Cursor Bugbot for commit 2449c21. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Fix memory leaks by evicting rollout trajectories, discarding delivered intercepts, and trimming worker heap

Runtime.cleanup_rollout now removes the trajectory entry from self.trajectories when a rollout is cleaned up, preventing unbounded trajectory accumulation.
forward_request unconditionally removes the intercept from Endpoint.server.intercepts after forwarding, even if response delivery raises.
EnvWorker.stats_loop calls libc.malloc_trim(0) on glibc-based systems each stats interval to release free heap memory back to the OS.

📊 Macroscope summarized 2449c21. 3 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted

🗂️ Filtered Issues

No issues evaluated.

…ts, trim worker arenas Three env-worker memory fixes for long multi-turn rollouts: - Runtime.trajectories (the live-trajectory registry serving sub-runtime trajectory handles) was never evicted; the process-lifetime Runtime retained every completed rollout's full trajectory. Pop it in cleanup_rollout alongside the existing per-trajectory dict evictions. - InterceptionServer.intercepts retained every request's raw body (full message history, incl. base64 images on multimodal envs) until rollout unregister; forward_request now discards each intercept after delivery. The HTTP handler keeps its own reference; discard is idempotent; unregister still sweeps undelivered entries. - Env workers never returned freed arena pages to the OS; malloc_trim(0) at the existing 10s stats cadence (ctypes releases the GIL; gc.collect is deliberately avoided here - full collections on large heaps stall the loop past the worker heartbeat timeout). Measured on browser-env rollouts (45 turns, 4 concurrent per worker): worker peak RSS 1053MB -> 275MB, resting 990MB -> 155MB. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

macroscopeapp · 2026-06-10T22:10:01Z

+            import ctypes
+
+            libc = ctypes.CDLL("libc.so.6")
+        except OSError:
+            libc = None
        while True:
            await asyncio.sleep(interval)

+            if libc is not None:
+                libc.malloc_trim(0)


🟡 Medium server/env_worker.py:252

On musl-based systems like Alpine Linux, libc.so.6 can exist but malloc_trim is a glibc-specific function that musl does not export. Calling libc.malloc_trim(0) throws AttributeError when the symbol is missing, which terminates stats_loop and stops all stats collection for the rest of the worker's lifetime. Consider catching AttributeError alongside OSError when loading the symbol, or guard the call with a try/except.

try: import ctypes libc = ctypes.CDLL("libc.so.6") + libc.malloc_trim(0) # verify symbol exists + libc.malloc_trim.argtypes = [ctypes.c_int] except (OSError, AttributeError): libc = None while True: await asyncio.sleep(interval) if libc is not None: - libc.malloc_trim(0) + try: + libc.malloc_trim(0) + except Exception: + pass

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file @verifiers/serve/server/env_worker.py around lines 252-261: On musl-based systems like Alpine Linux, `libc.so.6` can exist but `malloc_trim` is a glibc-specific function that musl does not export. Calling `libc.malloc_trim(0)` throws `AttributeError` when the symbol is missing, which terminates `stats_loop` and stops all stats collection for the rest of the worker's lifetime. Consider catching `AttributeError` alongside `OSError` when loading the symbol, or guard the call with a try/except. Evidence trail: verifiers/serve/server/env_worker.py lines 249-278 (REVIEWED_COMMIT): stats_loop method loads libc.so.6 catching only OSError (line 255), calls libc.malloc_trim(0) at line 261 without any exception handling. Line 289: stats_loop launched as asyncio.create_task. Python ctypes CDLL.__getattr__ raises AttributeError when dlsym fails to find a symbol. musl libc does not export malloc_trim (glibc-specific).

macroscopeapp · 2026-06-10T22:10:33Z

Approvability

Verdict: Needs human review

1 blocking correctness issue found. An unresolved review comment identifies a potential bug where the malloc_trim call could crash the stats_loop on musl-based systems (Alpine Linux) due to missing AttributeError handling. This substantive issue warrants human review before merging.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2449c2112d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-10T22:12:25Z

        key = str(state["trajectory_id"])
        self._model_request_locks.pop(key, None)
        self._inflight_visible_model_requests.pop(key, None)
+        self.trajectories.pop(key, None)


Delay trajectory eviction until group cleanup

When a grouped eval reaches group scoring, each harness.run(...) has already executed cleanup_rollout, so this pop removes the live trajectory before Env._run_group_states calls harness.score_group(...). Any group-stage handler/reward that starts a child task with state.for_task(..., transcript="append") will now fail in resolve_trajectory with No live trajectory registered, even though the state still retains its runtime handles until group cleanup. Keep trajectories for states with a group_key until cleanup_group, analogous to how model clients are retained across the group stage.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-10T22:12:25Z

+            else:
+                deliver_response(request, response, error)
+        finally:
+            endpoint.discard_request(request_id)


Do not discard undelivered streaming intercepts

For intercepted streaming requests whose protocol is not openai_chat_completions (for example /v1/messages or /v1/responses with stream: true), the branch above raises before anything is put on the stream queue or response future. This new unconditional discard then removes the intercept from InterceptionServer.intercepts, so unregister_rollout() can no longer put the EOF sentinel or cancel the future, leaving the HTTP handler waiting indefinitely instead of being unblocked during rollout cleanup.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-10T22:12:25Z

+            if libc is not None:
+                libc.malloc_trim(0)


Run malloc_trim away from the worker event loop

When workers are serving active rollouts, this synchronous ctypes call runs in stats_loop on the same asyncio thread that handles ZMQ requests and responses. Releasing the GIL only helps other threads; it does not let this event loop make progress, so a slow trim on a large fragmented heap delays all request handling and heartbeats every 10 seconds and can make the router restart an otherwise healthy worker. Move the trim to a background thread or otherwise bound/guard it.

Useful? React with 👍 / 👎.

macroscopeapp Bot reviewed Jun 10, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 10, 2026

View reviewed changes

hubert-marek marked this pull request as draft June 10, 2026 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(v1,serve): evict rollout trajectories, discard delivered intercepts, trim worker arenas#1611

fix(v1,serve): evict rollout trajectories, discard delivered intercepts, trim worker arenas#1611
hubert-marek wants to merge 1 commit into
mainfrom
fix/intercept-discard-worker-trim

hubert-marek commented Jun 10, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot Jun 10, 2026

Uh oh!

macroscopeapp Bot commented Jun 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hubert-marek commented Jun 10, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Evict completed rollouts from Runtime.trajectories (leak)

2. Discard delivered intercepts (dominant term: −74% worker peak RSS)

3. malloc_trim(0) at the worker stats cadence (resting RSS ÷3)

Measurement

Fix memory leaks by evicting rollout trajectories, discarding delivered intercepts, and trimming worker heap

🗂️ Filtered Issues

Uh oh!

macroscopeapp Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot commented Jun 10, 2026

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hubert-marek commented Jun 10, 2026 •

edited by macroscopeapp Bot

Loading

1. Evict completed rollouts from `Runtime.trajectories` (leak)

3. `malloc_trim(0)` at the worker stats cadence (resting RSS ÷3)