Skip to content

fix(v1): replace unreachable sandboxes#1578

Open
xeophon wants to merge 1 commit into
mainfrom
codex/recycle-unreachable-sandboxes
Open

fix(v1): replace unreachable sandboxes#1578
xeophon wants to merge 1 commit into
mainfrom
codex/recycle-unreachable-sandboxes

Conversation

@xeophon

@xeophon xeophon commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

  • add a V1-only readiness loop that does not hold a sandbox worker while polling
  • preserve the underlying gateway error after repeated readiness failures
  • delete and replace sandboxes that report RUNNING but remain unreachable
  • leave the legacy sandbox wait path unchanged

Why

V1 reuses one ThreadedAsyncSandboxClient per runtime. At eval concurrency 32, that client has four workers, while the SDK wait_for_creation() call occupies one worker for its entire poll loop. A sandbox with broken gateway routing can therefore pin 25% of the pool for 37-110 minutes.

The observed failures were control-plane RUNNING instances whose gateway exec path returned 503 upstream connect error, 404 Process discovery failed, or timed out. Increasing readiness attempts only extended the failure.

Validation

  • ruff check and ruff format --check
  • pytest -q tests/test_v1_runtime_lifecycle.py
  • pytest -q tests/test_sandbox_mixin.py tests/test_cli_agent_env.py tests/test_sandbox_env.py
  • synthetic unreachable-sandbox replacement and error-preservation checks
  • live python:3.11-slim sandbox creation/readiness/exec/delete smoke test

No docs or tests changed.


Note

Medium Risk
Changes sandbox create/readiness and retry behavior on a shared threaded client; misclassification of readiness errors could delete healthy sandboxes or mask failures, but legacy wait paths are unchanged.

Overview
Adds wait_for_creation_resilient on ThreadedAsyncSandboxClient so V1 readiness polling uses short get / execute_command calls on the asyncio side instead of blocking a thread-pool worker for the whole SDK wait_for_creation loop. While status is RUNNING, it probes with echo 'sandbox ready', maps ERROR / TERMINATED / TIMEOUT to the appropriate prime_sandboxes errors, and after repeated exec failures raises SandboxNotRunningError with status="RUNNING" while keeping the underlying gateway error in the message.

create_sandbox in V1 now wraps create + wait in up to SANDBOX_RETRY_ATTEMPTS iterations: it prefers wait_for_creation_resilient when present (legacy clients still use wait_for_creation). On that “RUNNING but unreachable” error it deletes the bad sandbox, logs a warning, and creates a replacement; other readiness failures still delete and re-raise on the last attempt.

Reviewed by Cursor Bugbot for commit ad7f1fd. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Retry sandbox creation on unreachable sandboxes in create_sandbox

  • Adds a wait_for_creation_resilient method to ThreadedAsyncSandboxClient that polls sandbox status and runs in-sandbox readiness checks, tracking consecutive successes and tolerating transient failures up to READINESS_FAILURE_ATTEMPTS = 5.
  • Reworks create_sandbox to retry provisioning up to SANDBOX_RETRY_ATTEMPTS times, preferring wait_for_creation_resilient when available.
  • When a sandbox reaches RUNNING status but fails readiness checks (SandboxNotRunningError with status RUNNING), it is deleted and replaced unless it is the final attempt.
  • Terminal sandbox failures (OOM, timeout, image pull errors) are mapped to specific exceptions and re-raised immediately without retrying.
  • Risk: create_sandbox may now raise different, more specific exceptions than before depending on how the sandbox fails.
📊 Macroscope summarized ad7f1fd. 2 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted

🗂️ Filtered Issues

No issues evaluated.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ad7f1fd. Configure here.

readiness_attempt + 1,
SANDBOX_RETRY_ATTEMPTS,
)
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacement retries reuse create request

Medium Severity

The readiness retry loop builds one CreateSandboxRequest before the loop and calls create again on replacement without a fresh name. If the prior unreachable sandbox still exists because delete_sandbox_id swallowed a delete failure, a duplicate name can make later creates fail or leave multiple sandboxes running.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ad7f1fd. Configure here.

Comment on lines +147 to +150
message = "Timeout during sandbox creation"
if last_readiness_error is not None:
message += f": {last_readiness_error}"
raise SandboxNotRunningError(sandbox_id, status=message)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low utils/threaded_sandbox_client.py:147

On line 150, message is passed to status= instead of message=, so the exception renders as "Sandbox xyz is not running (status=Timeout during sandbox creation: ...)" instead of the intended direct message. Pass message=message instead.

-        raise SandboxNotRunningError(sandbox_id, status=message)
+        raise SandboxNotRunningError(sandbox_id, message=message)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/utils/threaded_sandbox_client.py around lines 147-150:

On line 150, `message` is passed to `status=` instead of `message=`, so the exception renders as `"Sandbox xyz is not running (status=Timeout during sandbox creation: ...)"` instead of the intended direct message. Pass `message=message` instead.

Evidence trail:
verifiers/utils/threaded_sandbox_client.py line 150 (REVIEWED_COMMIT): `raise SandboxNotRunningError(sandbox_id, status=message)`. packages/prime-sandboxes/src/prime_sandboxes/exceptions.py lines 15-34 in https://github.com/PrimeIntellect-ai/prime: SandboxNotRunningError.__init__ accepts both `status` and `message` kwargs; when `message` is set, `msg = message`; when only `status` is set, `msg = f"Sandbox {sandbox_id} is not running (status={status})"`.

@macroscopeapp

macroscopeapp Bot commented Jun 9, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

This PR introduces significant new retry and resilience logic for sandbox creation, changing runtime behavior in infrastructure code. An unresolved medium-severity comment raises concerns about potential duplicate sandbox issues in the retry path.

You can customize Macroscope's approvability policy. Learn more.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad7f1fd6ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +107 to +110
readiness_failures += 1
last_readiness_error = exc
if readiness_failures >= self.READINESS_FAILURE_ATTEMPTS:
raise SandboxNotRunningError(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep waiting through slow gateway readiness

When the control plane reports RUNNING before the exec gateway is accepting commands, this hard cap raises after only five failed readiness probes, regardless of the caller's max_attempts/wait_timeout. For slow-starting sandboxes or delayed gateway routing, v1 will delete and recreate otherwise healthy sandboxes and can exhaust all six create retries instead of waiting for the original sandbox to become reachable.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant