feat(v1): support native provider clients by xeophon · Pull Request #1651 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-12T13:58:17Z

Overview

This PR expands the V1 client layer beyond OpenAI Chat Completions with native support for OpenAI Responses, Anthropic Messages, and Google generateContent. The clients share V1's portable message and response models while retaining provider-specific state needed for reasoning, tool use, streaming, sampling controls, and multimodal conversations.

Provider Clients

V1 client configuration can now select the appropriate protocol independently of the model endpoint. Each implementation translates between V1 messages and the provider SDK's native request and response types, keeping the public client interface consistent across providers.

The OpenAI Chat Completions client now handles multimodal messages, streaming responses, token IDs, logprobs, tool calls, and provider-specific sampling parameters through the OpenAI-compatible request path.

The OpenAI Responses client supports the Responses API item model, including function calls, reasoning items, image inputs, provider state replay, and streamed terminal responses.

The Anthropic client supports Messages API content blocks, top-level system instructions, image inputs, tool use and tool results, thinking blocks, sampling parameters, and streaming through the Anthropic SDK.

The Google client supports generateContent messages, system instructions, local image data, function calls and responses, thought parts, sampling parameter translation, and streamed response aggregation through the Google Gen AI SDK.

Conversation State

Provider-native response items are retained on assistant messages and replayed on subsequent turns. This preserves information that cannot be represented completely by plain text, such as reasoning signatures, thinking blocks, function-call metadata, and provider-specific content parts.

Tool calls and tool results remain represented through the common V1 message types. Each client performs only the protocol-specific grouping and naming required by its provider.

Sampling And Streaming

Common sampling fields are translated where provider APIs use different names, while provider-specific options continue to pass through the client configuration. This includes reasoning controls, token limits, stop sequences, candidate counts, logprobs, and compatible extension fields.

Streaming uses each provider SDK's native interface and returns the same final V1 Response shape as non-streaming calls. Provider usage, finish reasons, reasoning content, tool calls, and native state are carried into that final response.

Multimodal Inputs

The shared V1 message model supports text and image content parts. OpenAI and Anthropic accept their native image representations, while Google converts local base64 data URIs into native byte parts with the requested media resolution.

Note

Medium Risk
New dependency and large client surface area affect all model rollouts; provider_state changes assistant message hashes and interception wire format for reasoning continuations.

Overview
Adds native provider clients for OpenAI Responses, Anthropic Messages, and Google Gemini (generateContent), selectable via --client.type alongside the existing chat-completions and renderer clients. google-genai>=2.8.0 is added as a runtime dependency.

Each client maps V1 messages to provider SDK types, supports streaming (aggregated to the same Response shape), tools, sampling translation, and multimodal images (including image_url.detail on shared types). The OpenAI chat client is extended with typed wire helpers, streaming, richer reasoning/reasoning_details, and stricter image rules.

AssistantMessage.provider_state stores provider-native JSON needed for multi-turn continuations (thinking blocks, Responses output items, Gemini thought signatures). It is replayed on follow-up requests, round-tripped through the interception server as reasoning_details, included in graph message hashing, and Trace.has_response now treats reasoning, provider state, and tool calls as valid output—not only text content.

^{Reviewed by Cursor Bugbot for commit f80ed5a. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add native Anthropic, Google, and OpenAI Responses provider clients

Adds three new async client implementations: AnthropicMessagesClient (anthropic.py), GoogleResponsesClient (google.py), and OpenAIResponsesClient (openai_responses.py), each with full streaming, tool-calling, and multi-turn continuation support.
Each client converts internal message types to provider-native wire formats and parses responses back into a common Response type with reasoning_content, provider_state, tool_calls, and usage.
Adds corresponding config types (AnthropicMessagesClientConfig, GoogleResponsesClientConfig, OpenAIResponsesClientConfig) to the discriminated ClientConfig union so clients are selectable by name.
Extends AssistantMessage with a provider_state field to carry provider-native continuation data across turns, and updates Trace.has_response to treat reasoning, tool calls, or provider state as a valid response.
The interception server now round-trips reasoning_content and provider_state through OpenAI chat completion payloads.
Adds google-genai>=2.8.0 as a new required dependency.

^{Macroscope summarized f80ed5a.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8e9355e5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T14:03:14Z

    "loguru>=0.7.0",
    "tomli-w>=1.0.0",
    "renderers>=0.1.8.dev40",
+    "google-genai>=2.8.0",


Update the lockfile for google-genai

Adding google-genai here makes the package depend on a module that is imported at client-config import time (verifiers/v1/clients/config.py imports from google import genai), but uv.lock was not updated — rg '^name = "google-genai"' uv.lock returns no entry. In any locked/reproducible install path such as CI or contributor uv sync --locked, this commit cannot install the new dependency and import verifiers.v1.clients will fail once the lockfile environment is used.

Useful? React with 👍 / 👎.

macroscopeapp · 2026-06-12T14:04:52Z

Approvability

Verdict: Needs human review

1 blocking correctness issue found. This PR introduces substantial new feature capabilities (native Anthropic, Google, and OpenAI Responses API clients) with significant new integration logic. Multiple unresolved review comments identify potential bugs including missing lockfile updates, incorrect error handling, and missing defaults that could cause runtime failures.

^{You can customize Macroscope's approvability policy. Learn more.}

cursor · 2026-06-12T14:10:03Z

+        streaming = bool(sampling.pop("stream", False))
+        sampling.pop("n", None)  # Anthropic has no n parameter
+        if "max_tokens" not in sampling:
+            raise ValueError("Anthropic Messages requires max_tokens")


Anthropic eval lacks max_tokens

Medium Severity

AnthropicMessagesClient rejects requests when max_tokens is missing, while v1 EvalConfig defaults sampling to an empty SamplingConfig. Choosing --client.type anthropic_messages without --sampling.max-tokens fails before any API call, unlike the v0 Anthropic client that defaulted max_tokens.

^{Reviewed by Cursor Bugbot for commit e11a561. Configure here.}

cursor · 2026-06-12T14:10:03Z

+            # eval model ids may carry a provider prefix (e.g. "google/gemini-...")
+            "model": model.rsplit("/", 1)[-1],
+            "contents": cast(types.ContentListUnion, messages_to_wire(prompt)),
+            "config": types.GenerateContentConfig.model_validate(sampling),


Google rejects extra sampling keys

Medium Severity

GoogleResponsesClient builds GenerateContentConfig via model_validate on the full post-processed sampling dict. SamplingConfig allows arbitrary extra keys, so provider-specific or OpenAI-oriented fields (e.g. extra_body, stream_options, reasoning_effort) in eval config can trigger validation errors instead of a model request.

^{Reviewed by Cursor Bugbot for commit e11a561. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e11a5617b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T14:10:38Z

+        except AnthropicError as e:
+            raise ModelError(str(e)) from e


Preserve overlong-prompt classification for Anthropic

When Anthropic rejects an over-context request (for example, a long Claude rollout hitting BadRequestError: prompt is too long), this catch-all wraps it as ModelError; RetryingClient will retry it and the interception server only treats OverlongPromptError as a clean context_length truncation, so the rollout becomes a provider failure/502 instead. The existing v0 Anthropic client in this repo already classifies Anthropic context-limit phrases as OverlongPromptError, so the new native client should keep that behavior here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f80ed5abaf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T15:52:30Z

+            else:
+                response = await self.google.aio.models.generate_content(**request)
+        except errors.APIError as e:
+            raise ModelError(str(e)) from e


Map Gemini context-limit errors to OverlongPromptError

When Gemini rejects an oversized prompt (for example a 400 saying the input token count exceeds the model limit), this wraps it as ModelError. RetryingClient retries ModelError, and the interception server only turns OverlongPromptError into a clean prompt_too_long truncation, so google_responses rollouts with over-context prompts are retried and then surface as provider failures instead of the expected truncation path. Mirror the OpenAI context-length mapping here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-12T15:52:30Z

+
+    usage = response.usage_metadata
+    prompt_tokens = usage.prompt_token_count if usage else None
+    completion_tokens = usage.candidates_token_count if usage else None


Include Gemini thought tokens in completion usage

When Gemini thinking is enabled, usageMetadata can include thoughts_token_count in addition to candidates_token_count, but this line reports only the visible candidate tokens as completion usage. Other clients' completion usage includes reasoning tokens, and v1 documents completion tokens as every token the model produced, so Gemini reasoning rollouts underreport output usage/cost whenever hidden thoughts are present. Add usage.thoughts_token_count to the completion side when it is present.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f80ed5a. Configure here.}

cursor · 2026-06-12T15:52:45Z

+            call_names.update({call.id: call.name for call in message.tool_calls or []})
+            prompt.append(assistant_to_wire(message))
+        else:
+            part = tool_result_to_wire(message, call_names[message.tool_call_id])


Google tool name lookup KeyError

Medium Severity

When wiring tool results, the Google client resolves function names from call_names, but that map is filled only from AssistantMessage.tool_calls. If the preceding assistant turn replayed native provider_state parts without populated portable tool_calls, the next ToolMessage triggers an uncaught KeyError instead of building a function response.

^{Reviewed by Cursor Bugbot for commit f80ed5a. Configure here.}

xeophon added 3 commits June 12, 2026 15:37

feat(v1): add native provider clients

ffb6152

test: cover threaded sandbox polling

e1727a8

refactor(v1): simplify provider clients

d8e9355

xeophon changed the title ~~[codex] add native V1 provider clients~~ feat(v1): support native provider clients Jun 12, 2026

xeophon marked this pull request as ready for review June 12, 2026 13:59

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread tests/test_threaded_sandbox_client.py Outdated

Comment thread verifiers/v1/clients/google.py

test: remove threaded sandbox polling coverage

8d03c5d

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread verifiers/v1/clients/openai.py

fix(v1): normalize provider token usage

e11a561

macroscopeapp Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread verifiers/v1/clients/google.py Outdated

Comment thread verifiers/v1/clients/google.py

cursor Bot reviewed Jun 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

fix(v1): preserve reasoning-only responses

f80ed5a

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

cursor Bot reviewed Jun 12, 2026

View reviewed changes

xeophon mentioned this pull request Jun 12, 2026

feat(v1): dialect-routed interception — byte relay on matching protocols, typed translate otherwise #1657

Open

Conversation

xeophon commented Jun 12, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Provider Clients

Conversation State

Sampling And Streaming

Multimodal Inputs

Add native Anthropic, Google, and OpenAI Responses provider clients

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

macroscopeapp Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Anthropic eval lacks max_tokens

Uh oh!

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Google rejects extra sampling keys

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Google tool name lookup KeyError

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented Jun 12, 2026 •

edited by macroscopeapp Bot

Loading

macroscopeapp Bot commented Jun 12, 2026 •

edited

Loading