fix(v1): replace unreachable sandboxes by xeophon · Pull Request #1578 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-09T10:08:24Z

Summary

add a V1-only readiness loop that does not hold a sandbox worker while polling
preserve the underlying gateway error after repeated readiness failures
delete and replace sandboxes that report RUNNING but remain unreachable
leave the legacy sandbox wait path unchanged

Why

V1 reuses one ThreadedAsyncSandboxClient per runtime. At eval concurrency 32, that client has four workers, while the SDK wait_for_creation() call occupies one worker for its entire poll loop. A sandbox with broken gateway routing can therefore pin 25% of the pool for 37-110 minutes.

The observed failures were control-plane RUNNING instances whose gateway exec path returned 503 upstream connect error, 404 Process discovery failed, or timed out. Increasing readiness attempts only extended the failure.

Validation

ruff check and ruff format --check
pytest -q tests/test_v1_runtime_lifecycle.py
pytest -q tests/test_sandbox_mixin.py tests/test_cli_agent_env.py tests/test_sandbox_env.py
synthetic unreachable-sandbox replacement and error-preservation checks
live python:3.11-slim sandbox creation/readiness/exec/delete smoke test

No docs or tests changed.

Note

Medium Risk
Changes sandbox create/readiness and retry behavior on a shared threaded client; misclassification of readiness errors could delete healthy sandboxes or mask failures, but legacy wait paths are unchanged.

Overview
Adds wait_for_creation_resilient on ThreadedAsyncSandboxClient so V1 readiness polling uses short get / execute_command calls on the asyncio side instead of blocking a thread-pool worker for the whole SDK wait_for_creation loop. While status is RUNNING, it probes with echo 'sandbox ready', maps ERROR / TERMINATED / TIMEOUT to the appropriate prime_sandboxes errors, and after repeated exec failures raises SandboxNotRunningError with status="RUNNING" while keeping the underlying gateway error in the message.

create_sandbox in V1 now wraps create + wait in up to SANDBOX_RETRY_ATTEMPTS iterations: it prefers wait_for_creation_resilient when present (legacy clients still use wait_for_creation). On that “RUNNING but unreachable” error it deletes the bad sandbox, logs a warning, and creates a replacement; other readiness failures still delete and re-raise on the last attempt.

^{Reviewed by Cursor Bugbot for commit ad7f1fd. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Retry sandbox creation on unreachable sandboxes in `create_sandbox`

Adds a wait_for_creation_resilient method to ThreadedAsyncSandboxClient that polls sandbox status and runs in-sandbox readiness checks, tracking consecutive successes and tolerating transient failures up to READINESS_FAILURE_ATTEMPTS = 5.
Reworks create_sandbox to retry provisioning up to SANDBOX_RETRY_ATTEMPTS times, preferring wait_for_creation_resilient when available.
When a sandbox reaches RUNNING status but fails readiness checks (SandboxNotRunningError with status RUNNING), it is deleted and replaced unless it is the final attempt.
Terminal sandbox failures (OOM, timeout, image pull errors) are mapped to specific exceptions and re-raised immediately without retrying.
Risk: create_sandbox may now raise different, more specific exceptions than before depending on how the sandbox fails.

📊 Macroscope summarized ad7f1fd. 2 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted

🗂️ Filtered Issues

No issues evaluated.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ad7f1fd. Configure here.}

cursor · 2026-06-09T10:10:24Z

+                readiness_attempt + 1,
+                SANDBOX_RETRY_ATTEMPTS,
+            )
+            continue


Replacement retries reuse create request

Medium Severity

The readiness retry loop builds one CreateSandboxRequest before the loop and calls create again on replacement without a fresh name. If the prior unreachable sandbox still exists because delete_sandbox_id swallowed a delete failure, a duplicate name can make later creates fail or leave multiple sandboxes running.

^{Reviewed by Cursor Bugbot for commit ad7f1fd. Configure here.}

macroscopeapp · 2026-06-09T10:12:13Z

+        message = "Timeout during sandbox creation"
+        if last_readiness_error is not None:
+            message += f": {last_readiness_error}"
+        raise SandboxNotRunningError(sandbox_id, status=message)


🟢 Low utils/threaded_sandbox_client.py:147

On line 150, message is passed to status= instead of message=, so the exception renders as "Sandbox xyz is not running (status=Timeout during sandbox creation: ...)" instead of the intended direct message. Pass message=message instead.

- raise SandboxNotRunningError(sandbox_id, status=message) + raise SandboxNotRunningError(sandbox_id, message=message)

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file @verifiers/utils/threaded_sandbox_client.py around lines 147-150: On line 150, `message` is passed to `status=` instead of `message=`, so the exception renders as `"Sandbox xyz is not running (status=Timeout during sandbox creation: ...)"` instead of the intended direct message. Pass `message=message` instead. Evidence trail: verifiers/utils/threaded_sandbox_client.py line 150 (REVIEWED_COMMIT): `raise SandboxNotRunningError(sandbox_id, status=message)`. packages/prime-sandboxes/src/prime_sandboxes/exceptions.py lines 15-34 in https://github.com/PrimeIntellect-ai/prime: SandboxNotRunningError.__init__ accepts both `status` and `message` kwargs; when `message` is set, `msg = message`; when only `status` is set, `msg = f"Sandbox {sandbox_id} is not running (status={status})"`.

macroscopeapp · 2026-06-09T10:12:39Z

Approvability

Verdict: Needs human review

This PR introduces significant new retry and resilience logic for sandbox creation, changing runtime behavior in infrastructure code. An unresolved medium-severity comment raises concerns about potential duplicate sandbox issues in the retry path.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad7f1fd6ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T10:12:56Z

+                    readiness_failures += 1
+                    last_readiness_error = exc
+                    if readiness_failures >= self.READINESS_FAILURE_ATTEMPTS:
+                        raise SandboxNotRunningError(


Keep waiting through slow gateway readiness

When the control plane reports RUNNING before the exec gateway is accepting commands, this hard cap raises after only five failed readiness probes, regardless of the caller's max_attempts/wait_timeout. For slow-starting sandboxes or delayed gateway routing, v1 will delete and recreate otherwise healthy sandboxes and can exhaust all six create retries instead of waiting for the original sandbox to become reachable.

Useful? React with 👍 / 👎.

fix(v1): replace unreachable sandboxes

ad7f1fd

cursor Bot reviewed Jun 9, 2026

View reviewed changes

macroscopeapp Bot reviewed Jun 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(v1): replace unreachable sandboxes#1578

fix(v1): replace unreachable sandboxes#1578
xeophon wants to merge 1 commit into
mainfrom
codex/recycle-unreachable-sandboxes

xeophon commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 9, 2026

Uh oh!

macroscopeapp Bot Jun 9, 2026

Uh oh!

macroscopeapp Bot commented Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xeophon commented Jun 9, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Retry sandbox creation on unreachable sandboxes in create_sandbox

🗂️ Filtered Issues

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 9, 2026

Choose a reason for hiding this comment

Replacement retries reuse create request

Uh oh!

macroscopeapp Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot commented Jun 9, 2026

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Retry sandbox creation on unreachable sandboxes in `create_sandbox`