feat(v1): support native provider clients#1651
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d8e9355e5c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "loguru>=0.7.0", | ||
| "tomli-w>=1.0.0", | ||
| "renderers>=0.1.8.dev40", | ||
| "google-genai>=2.8.0", |
There was a problem hiding this comment.
Update the lockfile for google-genai
Adding google-genai here makes the package depend on a module that is imported at client-config import time (verifiers/v1/clients/config.py imports from google import genai), but uv.lock was not updated — rg '^name = "google-genai"' uv.lock returns no entry. In any locked/reproducible install path such as CI or contributor uv sync --locked, this commit cannot install the new dependency and import verifiers.v1.clients will fail once the lockfile environment is used.
Useful? React with 👍 / 👎.
ApprovabilityVerdict: Needs human review 1 blocking correctness issue found. This PR introduces substantial new feature capabilities (native Anthropic, Google, and OpenAI Responses API clients) with significant new integration logic. Multiple unresolved review comments identify potential bugs including missing lockfile updates, incorrect error handling, and missing defaults that could cause runtime failures. You can customize Macroscope's approvability policy. Learn more. |
| streaming = bool(sampling.pop("stream", False)) | ||
| sampling.pop("n", None) # Anthropic has no n parameter | ||
| if "max_tokens" not in sampling: | ||
| raise ValueError("Anthropic Messages requires max_tokens") |
There was a problem hiding this comment.
Anthropic eval lacks max_tokens
Medium Severity
AnthropicMessagesClient rejects requests when max_tokens is missing, while v1 EvalConfig defaults sampling to an empty SamplingConfig. Choosing --client.type anthropic_messages without --sampling.max-tokens fails before any API call, unlike the v0 Anthropic client that defaulted max_tokens.
Reviewed by Cursor Bugbot for commit e11a561. Configure here.
| # eval model ids may carry a provider prefix (e.g. "google/gemini-...") | ||
| "model": model.rsplit("/", 1)[-1], | ||
| "contents": cast(types.ContentListUnion, messages_to_wire(prompt)), | ||
| "config": types.GenerateContentConfig.model_validate(sampling), |
There was a problem hiding this comment.
Google rejects extra sampling keys
Medium Severity
GoogleResponsesClient builds GenerateContentConfig via model_validate on the full post-processed sampling dict. SamplingConfig allows arbitrary extra keys, so provider-specific or OpenAI-oriented fields (e.g. extra_body, stream_options, reasoning_effort) in eval config can trigger validation errors instead of a model request.
Reviewed by Cursor Bugbot for commit e11a561. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e11a5617b4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except AnthropicError as e: | ||
| raise ModelError(str(e)) from e |
There was a problem hiding this comment.
Preserve overlong-prompt classification for Anthropic
When Anthropic rejects an over-context request (for example, a long Claude rollout hitting BadRequestError: prompt is too long), this catch-all wraps it as ModelError; RetryingClient will retry it and the interception server only treats OverlongPromptError as a clean context_length truncation, so the rollout becomes a provider failure/502 instead. The existing v0 Anthropic client in this repo already classifies Anthropic context-limit phrases as OverlongPromptError, so the new native client should keep that behavior here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f80ed5abaf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| else: | ||
| response = await self.google.aio.models.generate_content(**request) | ||
| except errors.APIError as e: | ||
| raise ModelError(str(e)) from e |
There was a problem hiding this comment.
Map Gemini context-limit errors to OverlongPromptError
When Gemini rejects an oversized prompt (for example a 400 saying the input token count exceeds the model limit), this wraps it as ModelError. RetryingClient retries ModelError, and the interception server only turns OverlongPromptError into a clean prompt_too_long truncation, so google_responses rollouts with over-context prompts are retried and then surface as provider failures instead of the expected truncation path. Mirror the OpenAI context-length mapping here.
Useful? React with 👍 / 👎.
|
|
||
| usage = response.usage_metadata | ||
| prompt_tokens = usage.prompt_token_count if usage else None | ||
| completion_tokens = usage.candidates_token_count if usage else None |
There was a problem hiding this comment.
Include Gemini thought tokens in completion usage
When Gemini thinking is enabled, usageMetadata can include thoughts_token_count in addition to candidates_token_count, but this line reports only the visible candidate tokens as completion usage. Other clients' completion usage includes reasoning tokens, and v1 documents completion tokens as every token the model produced, so Gemini reasoning rollouts underreport output usage/cost whenever hidden thoughts are present. Add usage.thoughts_token_count to the completion side when it is present.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f80ed5a. Configure here.
| call_names.update({call.id: call.name for call in message.tool_calls or []}) | ||
| prompt.append(assistant_to_wire(message)) | ||
| else: | ||
| part = tool_result_to_wire(message, call_names[message.tool_call_id]) |
There was a problem hiding this comment.
Google tool name lookup KeyError
Medium Severity
When wiring tool results, the Google client resolves function names from call_names, but that map is filled only from AssistantMessage.tool_calls. If the preceding assistant turn replayed native provider_state parts without populated portable tool_calls, the next ToolMessage triggers an uncaught KeyError instead of building a function response.
Reviewed by Cursor Bugbot for commit f80ed5a. Configure here.


Overview
This PR expands the V1 client layer beyond OpenAI Chat Completions with native support for OpenAI Responses, Anthropic Messages, and Google generateContent. The clients share V1's portable message and response models while retaining provider-specific state needed for reasoning, tool use, streaming, sampling controls, and multimodal conversations.
Provider Clients
V1 client configuration can now select the appropriate protocol independently of the model endpoint. Each implementation translates between V1 messages and the provider SDK's native request and response types, keeping the public client interface consistent across providers.
The OpenAI Chat Completions client now handles multimodal messages, streaming responses, token IDs, logprobs, tool calls, and provider-specific sampling parameters through the OpenAI-compatible request path.
The OpenAI Responses client supports the Responses API item model, including function calls, reasoning items, image inputs, provider state replay, and streamed terminal responses.
The Anthropic client supports Messages API content blocks, top-level system instructions, image inputs, tool use and tool results, thinking blocks, sampling parameters, and streaming through the Anthropic SDK.
The Google client supports generateContent messages, system instructions, local image data, function calls and responses, thought parts, sampling parameter translation, and streamed response aggregation through the Google Gen AI SDK.
Conversation State
Provider-native response items are retained on assistant messages and replayed on subsequent turns. This preserves information that cannot be represented completely by plain text, such as reasoning signatures, thinking blocks, function-call metadata, and provider-specific content parts.
Tool calls and tool results remain represented through the common V1 message types. Each client performs only the protocol-specific grouping and naming required by its provider.
Sampling And Streaming
Common sampling fields are translated where provider APIs use different names, while provider-specific options continue to pass through the client configuration. This includes reasoning controls, token limits, stop sequences, candidate counts, logprobs, and compatible extension fields.
Streaming uses each provider SDK's native interface and returns the same final V1
Responseshape as non-streaming calls. Provider usage, finish reasons, reasoning content, tool calls, and native state are carried into that final response.Multimodal Inputs
The shared V1 message model supports text and image content parts. OpenAI and Anthropic accept their native image representations, while Google converts local base64 data URIs into native byte parts with the requested media resolution.
Note
Medium Risk
New dependency and large client surface area affect all model rollouts; provider_state changes assistant message hashes and interception wire format for reasoning continuations.
Overview
Adds native provider clients for OpenAI Responses, Anthropic Messages, and Google Gemini (
generateContent), selectable via--client.typealongside the existing chat-completions and renderer clients.google-genai>=2.8.0is added as a runtime dependency.Each client maps V1 messages to provider SDK types, supports streaming (aggregated to the same
Responseshape), tools, sampling translation, and multimodal images (includingimage_url.detailon shared types). The OpenAI chat client is extended with typed wire helpers, streaming, richer reasoning/reasoning_details, and stricter image rules.AssistantMessage.provider_statestores provider-native JSON needed for multi-turn continuations (thinking blocks, Responses output items, Gemini thought signatures). It is replayed on follow-up requests, round-tripped through the interception server asreasoning_details, included in graph message hashing, andTrace.has_responsenow treats reasoning, provider state, and tool calls as valid output—not only text content.Reviewed by Cursor Bugbot for commit f80ed5a. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add native Anthropic, Google, and OpenAI Responses provider clients
AnthropicMessagesClient(anthropic.py),GoogleResponsesClient(google.py), andOpenAIResponsesClient(openai_responses.py), each with full streaming, tool-calling, and multi-turn continuation support.Responsetype withreasoning_content,provider_state,tool_calls, andusage.AnthropicMessagesClientConfig,GoogleResponsesClientConfig,OpenAIResponsesClientConfig) to the discriminatedClientConfigunion so clients are selectable by name.AssistantMessagewith aprovider_statefield to carry provider-native continuation data across turns, and updatesTrace.has_responseto treat reasoning, tool calls, or provider state as a valid response.reasoning_contentandprovider_statethrough OpenAI chat completion payloads.google-genai>=2.8.0as a new required dependency.Macroscope summarized f80ed5a.