Add Codex CLI harness#1568
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e7b4378. Configure here.
| __all__ = [ | ||
| "CodexCLI", | ||
| "CodexCLIConfig", | ||
| "CodexCLIProgramConfig", |
There was a problem hiding this comment.
Harness package version not bumped
Medium Severity
This PR adds the public CodexCLI harness and exports it from harnesses, but leaves __version__ at 0.1.2. That leaves package version metadata out of step with the new user-facing behavior, so a post-merge publish may not ship CodexCLI under a new release tag.
Triggered by project rule: BugBot Instructions
Reviewed by Cursor Bugbot for commit e7b4378. Configure here.
ApprovabilityVerdict: Needs human review This PR adds a new CodexCLI harness feature with 172 lines of new implementation, including authentication handling and shell script generation. New features introducing user-facing capabilities warrant human review. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7b43784f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| --ephemeral \\ | ||
| --skip-git-repo-check \\ | ||
| --dangerously-bypass-approvals-and-sandbox \\ | ||
| --json \\ |
There was a problem hiding this comment.
Return Codex's final message as the completion
When this harness is used on tasks that score or display the assistant completion, --json makes Codex print newline-delimited event JSON to stdout rather than the answer text (the Codex CLI docs describe --json this way and --output-last-message as the final-message path: https://developers.openai.com/codex/cli/reference#codex-exec). The v1 sandbox runner records stdout directly into state["completion"] (verifiers/v1/utils/sandbox_utils.py), so these rollouts will expose a JSON event log as the model completion while the actual final message is only an artifact.
Useful? React with 👍 / 👎.


Summary
Testing
adaptive-rejection-sampler: reward 1.0Note
Medium Risk
New sandboxed agent path runs Codex with bypass-approvals flags and handles API keys or subscription auth JSON; mistakes could affect credential handling or remote model calls during rollouts.
Overview
Adds a packaged Codex CLI command harness (
harnesses.codex_cli) so evals can run the OpenAI Codex agent in sandboxes alongside OpenCode, Pi, and similar harnesses.CodexCLIProgramConfig.resolve()builds the sandbox program: installs Codex via the official installer, wires task/system prompts into files, runscodex execwith JSON logging and optional artifacts, and supportsapi_key(default: runtime model + intercepted API key / base URL) orchatgpt(CODEX_AUTH_JSONsecret). Version strings likecodex@latestorcodex@0.137.0control the installer release.Package exports, BYO-harness and harnesses README docs (TOML examples, ChatGPT auth), and tests cover program construction, auth helpers, imports, and parity with other command harnesses in the deferred program-override parametrize.
Reviewed by Cursor Bugbot for commit e7b4378. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add CodexCLI harness for running Codex CLI in sandboxed eval programs
CodexCLI,CodexCLIConfig, andCodexCLIProgramConfig— a new harness that installs and executes Codex CLI inside a sandboxed environment.api_key(logs in viaOPENAI_API_KEY) andchatgpt(writes anauth.jsonfromCODEX_AUTH_JSONenv var).harnessespackage and adds them to import tests.Macroscope summarized e7b4378.