fix(v1): replace unreachable sandboxes#1578
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ad7f1fd. Configure here.
| readiness_attempt + 1, | ||
| SANDBOX_RETRY_ATTEMPTS, | ||
| ) | ||
| continue |
There was a problem hiding this comment.
Replacement retries reuse create request
Medium Severity
The readiness retry loop builds one CreateSandboxRequest before the loop and calls create again on replacement without a fresh name. If the prior unreachable sandbox still exists because delete_sandbox_id swallowed a delete failure, a duplicate name can make later creates fail or leave multiple sandboxes running.
Reviewed by Cursor Bugbot for commit ad7f1fd. Configure here.
| message = "Timeout during sandbox creation" | ||
| if last_readiness_error is not None: | ||
| message += f": {last_readiness_error}" | ||
| raise SandboxNotRunningError(sandbox_id, status=message) |
There was a problem hiding this comment.
🟢 Low utils/threaded_sandbox_client.py:147
On line 150, message is passed to status= instead of message=, so the exception renders as "Sandbox xyz is not running (status=Timeout during sandbox creation: ...)" instead of the intended direct message. Pass message=message instead.
- raise SandboxNotRunningError(sandbox_id, status=message)
+ raise SandboxNotRunningError(sandbox_id, message=message)🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/utils/threaded_sandbox_client.py around lines 147-150:
On line 150, `message` is passed to `status=` instead of `message=`, so the exception renders as `"Sandbox xyz is not running (status=Timeout during sandbox creation: ...)"` instead of the intended direct message. Pass `message=message` instead.
Evidence trail:
verifiers/utils/threaded_sandbox_client.py line 150 (REVIEWED_COMMIT): `raise SandboxNotRunningError(sandbox_id, status=message)`. packages/prime-sandboxes/src/prime_sandboxes/exceptions.py lines 15-34 in https://github.com/PrimeIntellect-ai/prime: SandboxNotRunningError.__init__ accepts both `status` and `message` kwargs; when `message` is set, `msg = message`; when only `status` is set, `msg = f"Sandbox {sandbox_id} is not running (status={status})"`.
ApprovabilityVerdict: Needs human review This PR introduces significant new retry and resilience logic for sandbox creation, changing runtime behavior in infrastructure code. An unresolved medium-severity comment raises concerns about potential duplicate sandbox issues in the retry path. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad7f1fd6ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| readiness_failures += 1 | ||
| last_readiness_error = exc | ||
| if readiness_failures >= self.READINESS_FAILURE_ATTEMPTS: | ||
| raise SandboxNotRunningError( |
There was a problem hiding this comment.
Keep waiting through slow gateway readiness
When the control plane reports RUNNING before the exec gateway is accepting commands, this hard cap raises after only five failed readiness probes, regardless of the caller's max_attempts/wait_timeout. For slow-starting sandboxes or delayed gateway routing, v1 will delete and recreate otherwise healthy sandboxes and can exhaust all six create retries instead of waiting for the original sandbox to become reachable.
Useful? React with 👍 / 👎.


Summary
RUNNINGbut remain unreachableWhy
V1 reuses one
ThreadedAsyncSandboxClientper runtime. At eval concurrency 32, that client has four workers, while the SDKwait_for_creation()call occupies one worker for its entire poll loop. A sandbox with broken gateway routing can therefore pin 25% of the pool for 37-110 minutes.The observed failures were control-plane
RUNNINGinstances whose gateway exec path returned503 upstream connect error,404 Process discovery failed, or timed out. Increasing readiness attempts only extended the failure.Validation
ruff checkandruff format --checkpytest -q tests/test_v1_runtime_lifecycle.pypytest -q tests/test_sandbox_mixin.py tests/test_cli_agent_env.py tests/test_sandbox_env.pypython:3.11-slimsandbox creation/readiness/exec/delete smoke testNo docs or tests changed.
Note
Medium Risk
Changes sandbox create/readiness and retry behavior on a shared threaded client; misclassification of readiness errors could delete healthy sandboxes or mask failures, but legacy wait paths are unchanged.
Overview
Adds
wait_for_creation_resilientonThreadedAsyncSandboxClientso V1 readiness polling uses shortget/execute_commandcalls on the asyncio side instead of blocking a thread-pool worker for the whole SDKwait_for_creationloop. While status isRUNNING, it probes withecho 'sandbox ready', mapsERROR/TERMINATED/TIMEOUTto the appropriateprime_sandboxeserrors, and after repeated exec failures raisesSandboxNotRunningErrorwithstatus="RUNNING"while keeping the underlying gateway error in the message.create_sandboxin V1 now wraps create + wait in up toSANDBOX_RETRY_ATTEMPTSiterations: it preferswait_for_creation_resilientwhen present (legacy clients still usewait_for_creation). On that “RUNNING but unreachable” error it deletes the bad sandbox, logs a warning, and creates a replacement; other readiness failures still delete and re-raise on the last attempt.Reviewed by Cursor Bugbot for commit ad7f1fd. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Retry sandbox creation on unreachable sandboxes in
create_sandboxwait_for_creation_resilientmethod toThreadedAsyncSandboxClientthat polls sandbox status and runs in-sandbox readiness checks, tracking consecutive successes and tolerating transient failures up toREADINESS_FAILURE_ATTEMPTS = 5.create_sandboxto retry provisioning up toSANDBOX_RETRY_ATTEMPTStimes, preferringwait_for_creation_resilientwhen available.RUNNINGstatus but fails readiness checks (SandboxNotRunningErrorwith statusRUNNING), it is deleted and replaced unless it is the final attempt.create_sandboxmay now raise different, more specific exceptions than before depending on how the sandbox fails.📊 Macroscope summarized ad7f1fd. 2 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted
🗂️ Filtered Issues
No issues evaluated.