Skip to content

feat(v1): support native provider clients#1651

Open
xeophon wants to merge 6 commits into
feat/nano-as-v1from
codex/v1-native-provider-clients
Open

feat(v1): support native provider clients#1651
xeophon wants to merge 6 commits into
feat/nano-as-v1from
codex/v1-native-provider-clients

Conversation

@xeophon

@xeophon xeophon commented Jun 12, 2026

Copy link
Copy Markdown
Member

Overview

This PR expands the V1 client layer beyond OpenAI Chat Completions with native support for OpenAI Responses, Anthropic Messages, and Google generateContent. The clients share V1's portable message and response models while retaining provider-specific state needed for reasoning, tool use, streaming, sampling controls, and multimodal conversations.

Provider Clients

V1 client configuration can now select the appropriate protocol independently of the model endpoint. Each implementation translates between V1 messages and the provider SDK's native request and response types, keeping the public client interface consistent across providers.

The OpenAI Chat Completions client now handles multimodal messages, streaming responses, token IDs, logprobs, tool calls, and provider-specific sampling parameters through the OpenAI-compatible request path.

The OpenAI Responses client supports the Responses API item model, including function calls, reasoning items, image inputs, provider state replay, and streamed terminal responses.

The Anthropic client supports Messages API content blocks, top-level system instructions, image inputs, tool use and tool results, thinking blocks, sampling parameters, and streaming through the Anthropic SDK.

The Google client supports generateContent messages, system instructions, local image data, function calls and responses, thought parts, sampling parameter translation, and streamed response aggregation through the Google Gen AI SDK.

Conversation State

Provider-native response items are retained on assistant messages and replayed on subsequent turns. This preserves information that cannot be represented completely by plain text, such as reasoning signatures, thinking blocks, function-call metadata, and provider-specific content parts.

Tool calls and tool results remain represented through the common V1 message types. Each client performs only the protocol-specific grouping and naming required by its provider.

Sampling And Streaming

Common sampling fields are translated where provider APIs use different names, while provider-specific options continue to pass through the client configuration. This includes reasoning controls, token limits, stop sequences, candidate counts, logprobs, and compatible extension fields.

Streaming uses each provider SDK's native interface and returns the same final V1 Response shape as non-streaming calls. Provider usage, finish reasons, reasoning content, tool calls, and native state are carried into that final response.

Multimodal Inputs

The shared V1 message model supports text and image content parts. OpenAI and Anthropic accept their native image representations, while Google converts local base64 data URIs into native byte parts with the requested media resolution.


Note

Medium Risk
New dependency and large client surface area affect all model rollouts; provider_state changes assistant message hashes and interception wire format for reasoning continuations.

Overview
Adds native provider clients for OpenAI Responses, Anthropic Messages, and Google Gemini (generateContent), selectable via --client.type alongside the existing chat-completions and renderer clients. google-genai>=2.8.0 is added as a runtime dependency.

Each client maps V1 messages to provider SDK types, supports streaming (aggregated to the same Response shape), tools, sampling translation, and multimodal images (including image_url.detail on shared types). The OpenAI chat client is extended with typed wire helpers, streaming, richer reasoning/reasoning_details, and stricter image rules.

AssistantMessage.provider_state stores provider-native JSON needed for multi-turn continuations (thinking blocks, Responses output items, Gemini thought signatures). It is replayed on follow-up requests, round-tripped through the interception server as reasoning_details, included in graph message hashing, and Trace.has_response now treats reasoning, provider state, and tool calls as valid output—not only text content.

Reviewed by Cursor Bugbot for commit f80ed5a. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add native Anthropic, Google, and OpenAI Responses provider clients

  • Adds three new async client implementations: AnthropicMessagesClient (anthropic.py), GoogleResponsesClient (google.py), and OpenAIResponsesClient (openai_responses.py), each with full streaming, tool-calling, and multi-turn continuation support.
  • Each client converts internal message types to provider-native wire formats and parses responses back into a common Response type with reasoning_content, provider_state, tool_calls, and usage.
  • Adds corresponding config types (AnthropicMessagesClientConfig, GoogleResponsesClientConfig, OpenAIResponsesClientConfig) to the discriminated ClientConfig union so clients are selectable by name.
  • Extends AssistantMessage with a provider_state field to carry provider-native continuation data across turns, and updates Trace.has_response to treat reasoning, tool calls, or provider state as a valid response.
  • The interception server now round-trips reasoning_content and provider_state through OpenAI chat completion payloads.
  • Adds google-genai>=2.8.0 as a new required dependency.

Macroscope summarized f80ed5a.

@xeophon xeophon changed the title [codex] add native V1 provider clients feat(v1): support native provider clients Jun 12, 2026
@xeophon xeophon marked this pull request as ready for review June 12, 2026 13:59
Comment thread tests/test_threaded_sandbox_client.py Outdated
Comment thread verifiers/v1/clients/google.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8e9355e5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pyproject.toml
"loguru>=0.7.0",
"tomli-w>=1.0.0",
"renderers>=0.1.8.dev40",
"google-genai>=2.8.0",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Update the lockfile for google-genai

Adding google-genai here makes the package depend on a module that is imported at client-config import time (verifiers/v1/clients/config.py imports from google import genai), but uv.lock was not updated — rg '^name = "google-genai"' uv.lock returns no entry. In any locked/reproducible install path such as CI or contributor uv sync --locked, this commit cannot install the new dependency and import verifiers.v1.clients will fail once the lockfile environment is used.

Useful? React with 👍 / 👎.

Comment thread verifiers/v1/clients/openai.py
@macroscopeapp

macroscopeapp Bot commented Jun 12, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

1 blocking correctness issue found. This PR introduces substantial new feature capabilities (native Anthropic, Google, and OpenAI Responses API clients) with significant new integration logic. Multiple unresolved review comments identify potential bugs including missing lockfile updates, incorrect error handling, and missing defaults that could cause runtime failures.

You can customize Macroscope's approvability policy. Learn more.

Comment thread verifiers/v1/clients/google.py Outdated
Comment thread verifiers/v1/clients/google.py
streaming = bool(sampling.pop("stream", False))
sampling.pop("n", None) # Anthropic has no n parameter
if "max_tokens" not in sampling:
raise ValueError("Anthropic Messages requires max_tokens")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anthropic eval lacks max_tokens

Medium Severity

AnthropicMessagesClient rejects requests when max_tokens is missing, while v1 EvalConfig defaults sampling to an empty SamplingConfig. Choosing --client.type anthropic_messages without --sampling.max-tokens fails before any API call, unlike the v0 Anthropic client that defaulted max_tokens.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e11a561. Configure here.

Comment thread verifiers/v1/clients/anthropic.py Outdated
# eval model ids may carry a provider prefix (e.g. "google/gemini-...")
"model": model.rsplit("/", 1)[-1],
"contents": cast(types.ContentListUnion, messages_to_wire(prompt)),
"config": types.GenerateContentConfig.model_validate(sampling),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google rejects extra sampling keys

Medium Severity

GoogleResponsesClient builds GenerateContentConfig via model_validate on the full post-processed sampling dict. SamplingConfig allows arbitrary extra keys, so provider-specific or OpenAI-oriented fields (e.g. extra_body, stream_options, reasoning_effort) in eval config can trigger validation errors instead of a model request.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e11a561. Configure here.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e11a5617b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +232 to +233
except AnthropicError as e:
raise ModelError(str(e)) from e

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve overlong-prompt classification for Anthropic

When Anthropic rejects an over-context request (for example, a long Claude rollout hitting BadRequestError: prompt is too long), this catch-all wraps it as ModelError; RetryingClient will retry it and the interception server only treats OverlongPromptError as a clean context_length truncation, so the rollout becomes a provider failure/502 instead. The existing v0 Anthropic client in this repo already classifies Anthropic context-limit phrases as OverlongPromptError, so the new native client should keep that behavior here.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f80ed5abaf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

else:
response = await self.google.aio.models.generate_content(**request)
except errors.APIError as e:
raise ModelError(str(e)) from e

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Map Gemini context-limit errors to OverlongPromptError

When Gemini rejects an oversized prompt (for example a 400 saying the input token count exceeds the model limit), this wraps it as ModelError. RetryingClient retries ModelError, and the interception server only turns OverlongPromptError into a clean prompt_too_long truncation, so google_responses rollouts with over-context prompts are retried and then surface as provider failures instead of the expected truncation path. Mirror the OpenAI context-length mapping here.

Useful? React with 👍 / 👎.


usage = response.usage_metadata
prompt_tokens = usage.prompt_token_count if usage else None
completion_tokens = usage.candidates_token_count if usage else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include Gemini thought tokens in completion usage

When Gemini thinking is enabled, usageMetadata can include thoughts_token_count in addition to candidates_token_count, but this line reports only the visible candidate tokens as completion usage. Other clients' completion usage includes reasoning tokens, and v1 documents completion tokens as every token the model produced, so Gemini reasoning rollouts underreport output usage/cost whenever hidden thoughts are present. Add usage.thoughts_token_count to the completion side when it is present.

Useful? React with 👍 / 👎.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f80ed5a. Configure here.

call_names.update({call.id: call.name for call in message.tool_calls or []})
prompt.append(assistant_to_wire(message))
else:
part = tool_result_to_wire(message, call_names[message.tool_call_id])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google tool name lookup KeyError

Medium Severity

When wiring tool results, the Google client resolves function names from call_names, but that map is filled only from AssistantMessage.tool_calls. If the preceding assistant turn replayed native provider_state parts without populated portable tool_calls, the next ToolMessage triggers an uncaught KeyError instead of building a function response.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f80ed5a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant