feat(v1): dialect-routed interception — byte relay on matching protocols, typed translate otherwise by xeophon · Pull Request #1657 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-12T20:06:23Z

Summary

Stacked on #1651 (native provider clients); merges into feat/nano-as-v1 once that lands.

This PR combines the two halves discussed in #1654: route-detected wire dialects on ingress (from that PR's design) and the native provider clients on egress (#1651), reconciled by one request-time rule:

Relay — when the rollout's client natively speaks the request's dialect (Client.dialect == Dialect.name), the program's request bytes are forwarded verbatim and the provider's response (JSON or SSE) is relayed back untouched. The dialect parses a copy only to record the trace. No field is lost to a typed round-trip — not reasoning, not cache_control, not any future provider field.
Translate — otherwise (training via the renderer or chat→vLLM, or any cross-protocol pairing such as a claude-code harness against a chat endpoint), the request is parsed into typed messages, the client runs it in its own wire format, and the typed Response is serialized back in the dialect the program spoke.

The selection is a string compare per request: no model→client tables, no Harness.DIALECT, no body sniffing. The harness only chooses env vars, as before.

What's new

verifiers/v1/dialects/ — one module per native format (chat, anthropic, responses), each owning its route, auth carrier (Bearer / x-api-key), wire→vf parsers (parse_request / parse_response / parse_stream), and vf→wire serializers (serialize_response / serialize_stream, translate path only). Response parsing reuses the clients' response_from_wire, so provider_state (thinking blocks, Responses output items, reasoning_details) and vLLM token ids land on the trace on both arms.
InterceptionServer mounts every dialect's routes; the SDK's URL picks the codec. Streaming relays pass SSE bytes through chunk-by-chunk and tee-parse the assembled final message for the trace; streaming translate fake-streams a minimal valid SSE. Anthropic's count_tokens is relayed on the relay arm and estimated (~4 chars/token) on translate. Errors come back in each dialect's native error shape.
Clients keep the typed get_response (judges and in-env calls unaffected) and gain dialect + relay(body, route); RetryingClient retries relays the same way (relay raises before any byte is returned). Renderer and Google stay translate-only (dialect = None) — which is exactly how training keeps typed token data.
Harnesses now receive the server's root endpoint and append what their SDK expects ({endpoint}/v1 for OpenAI SDKs; bare root for ANTHROPIC_BASE_URL). The default harness carries the eval's sampling into the program (OPENAI_SAMPLING env → extra_body) since the relay no longer injects it.

A claude-code harness now needs only ANTHROPIC_BASE_URL={endpoint} + ANTHROPIC_API_KEY={secret}: ingress (incl. streaming + count_tokens) is already served, relay applies against any anthropic-speaking endpoint (api.anthropic.com, vLLM's native /v1/messages), and translate covers everything else.

Verification

Unit (tests/v1/test_dialects.py, 17 tests + full v1 suite, 63 passed): codec round-trips validated against the provider SDK models (anthropic.types.Message, openai.types.responses.Response), stream assembly/fake-stream round-trips, byte-verbatim relay assertions (unknown request fields survive), per-dialect auth + error shapes, aux-route relay/estimate, refusals in dialect error shape.

Live, through the real eval machinery (default harness × subprocess):

scenario	arm	result
chat program → OpenAI chat endpoint	relay	reward 1.0
chat program → OpenAI Responses client	translate (cross-protocol)	reward 1.0
chat program → Google client	translate (cross-protocol)	reward 1.0
multi-turn user-sim (`extend_request` over relay)	relay	reward 1.0, 2 turns
agentic bash tool calls	relay	reward 1.0

Live testing caught two real relay bugs, both fixed: newer openai/anthropic SDKs exclude auth from default_headers (now merged from auth_headers explicitly), and duplicate Content-Type headers made OpenAI reject relayed bodies (header keys now lowercased/deduped).

Not live-tested: anthropic/responses ingress (no claude-code/codex harness exists yet — covered by unit tests against the SDK models) and the renderer translate path (needs a vLLM engine).

Design choices reviewers may want to veto

Bytes mean bytes: the relay arm does not inject model or sampling. The model id already reaches the program via harness env; the default harness now carries sampling itself. On rlm's relay path, eval --sampling.* does not apply until rlm reads an equivalent env (translate/training keeps ctx.sampling pinned).
User-sim over relay is chat-only (Dialect.extend_request); on other dialects a simulator works via the translate arm. Streamed turns never drive a simulator.
Anthropic translate emits reasoning as an unsigned thinking block so the program displays it and echoes it back, where parse_request recovers it — the reasoning-passback some models hard-require, carried across protocols through the program itself.
Responses statefulness is not emulated on translate (previous_response_id ignored; the trace graph could emulate it later); on relay the endpoint owns it.
count_tokens translate-arm estimate is deliberately crude (compaction-trigger fidelity, not billing).

Breaking

parse_message / parse_tools / serialize_completion moved from interception.server to verifiers.v1.dialects (chat module).
Rollout/InterceptionPool hand harnesses the endpoint root (no /v1 suffix); out-of-tree harnesses must append their SDK's path (in-tree default/rlm/compact updated).
Client gains dialect/relay; custom clients are unaffected unless they want relay.

Follow-up (out of scope)

claude-code / codex harnesses (env vars only).
Egress auto-detection sugar (-m anthropic:claude-… scheme + host claims) so direct-API runs don't need --client.type.
Optional: vf→wire request serializers for non-chat user-sim relay, previous_response_id emulation from the trace graph.

Note

High Risk
Core rollout path changes (interception routing, verbatim upstream relay, endpoint root contract) affect every harness model call; mistakes could break auth, streaming, or cross-protocol evals.

Overview
Adds polyglot interception: the server mounts chat, anthropic, and responses routes and picks relay (verbatim request/response bytes when Client.dialect matches) vs translate (parse → typed client → serialize back) per request.

New verifiers/v1/dialects/ holds per-format codecs (parse/serialize, streaming SSE, dialect-shaped errors, Anthropic count_tokens aux). Chat wire helpers move here from interception.server.

Clients gain dialect + relay() via shared relay_post/relay_headers; RetryingClient retries relays. Shared classify_model_error replaces duplicated context-length checks in OpenAI client code.

Harness/rollout expose the interception root (no /v1 suffix); in-tree harnesses set OPENAI_BASE_URL to {endpoint}/v1. Default harness passes eval sampling as OPENAI_SAMPLING merged into program extra_body (relay no longer injects sampling).

Tests: new test_dialects.py for codecs and relay/translate server behavior; test_clients imports from dialects.

^{Reviewed by Cursor Bugbot for commit 302dcee. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add dialect-routed interception server supporting Anthropic, OpenAI Chat, and OpenAI Responses

Refactors the interception server into a multi-dialect polyglot proxy that registers routes for Anthropic (/v1/messages), OpenAI Chat (/v1/chat/completions), and OpenAI Responses (/v1/responses) concurrently.
Introduces a Dialect abstraction in verifiers/v1/dialects/ with per-dialect parse/serialize, SSE handling, auth extraction, error shaping, and aux route support.
Routes requests through two arms: relay (raw byte pass-through when the client's dialect matches the ingress dialect) and translate (typed conversion through the existing middle layer otherwise).
Adds relay() to Client and all three client implementations, backed by a shared relay_post helper with retry logic, streaming, and error classification via classify_model_error.
Harnesses now receive root endpoints and must append their SDK-specific path (e.g. /v1); all harness OPENAI_BASE_URL values updated accordingly.
Risk: Behavioral change — existing callers of InterceptionPool._entry and Rollout._serve_interception now receive root endpoints (no /v1 suffix); harnesses must be updated or requests will 404.

📊 Macroscope summarized 302dcee. 20 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted

🗂️ Filtered Issues

No issues evaluated.

…ols, typed translate otherwise The interception server now serves every registered wire dialect's routes (chat completions, Anthropic Messages, OpenAI Responses), resolving a request's format from the endpoint the program's SDK posts to. When the rollout's client natively speaks the request's dialect, the request/response bytes are relayed verbatim (incl. SSE pass-through) and parsed only to record the trace; otherwise the request is translated through the typed middle and the response is serialized back in the program's native format (the training path via renderer/chat-vLLM). Each dialect owns its full codec (parse_request/parse_response/parse_stream + serialize_response/serialize_stream + auth carrier); clients keep the typed get_response and gain dialect + relay(). Harnesses receive the server's root endpoint and append their SDK's suffix; the default harness carries the eval's sampling via OPENAI_SAMPLING since relay no longer injects it. Amp-Thread-ID: https://ampcode.com/threads/T-019ebd29-3076-77b1-a20e-71177db9f1e3 Co-authored-by: Amp <amp@ampcode.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 302dcee. Configure here.}

cursor · 2026-06-12T20:08:19Z

    ) -> ProgramResult:
        env = {
-            "OPENAI_BASE_URL": endpoint,
+            "OPENAI_BASE_URL": f"{endpoint}/v1",


Compact harness drops eval sampling

Medium Severity

On the relay path the interception server forwards request bodies unchanged, so eval ctx.sampling must reach the program via env (as the default harness does with OPENAI_SAMPLING and extra_body). The compact harness and program were not updated, so temperature, max tokens, and other sampling settings from the eval are omitted from compact harness model calls when the client uses the chat dialect.

^{Reviewed by Cursor Bugbot for commit 302dcee. Configure here.}

cursor · 2026-06-12T20:08:19Z

+        `endpoint` — the server's *root*, serving every registered dialect's routes, so
+        each harness appends what its program's SDK expects (`{endpoint}/v1` for an
+        OpenAI base URL, `endpoint` itself for an Anthropic one) and authenticates with
+        `secret` (bearer token / api key); `mcp_urls` are the task's tool servers


Missing docs for breaking changes

Medium Severity

This PR changes core v1 interception and harness contracts (endpoint root vs /v1, relay vs translate, moved parse_message / serialize_completion, new Client.dialect / relay), but the diff includes no updates under docs/ or affected skills/ for evaluation and harness setup.

^{Triggered by project rule: BugBot Instructions}

^{Reviewed by Cursor Bugbot for commit 302dcee. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 302dceeff8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T20:11:47Z

    ) -> ProgramResult:
        env = {
-            "OPENAI_BASE_URL": endpoint,
+            "OPENAI_BASE_URL": f"{endpoint}/v1",


Preserve sampling for compact harness requests

When this harness points the OpenAI SDK at the new /v1 interception endpoint, a matching chat client now takes the relay path, so InterceptionServer no longer injects ctx.sampling into the model call. The default harness compensates by passing OPENAI_SAMPLING, but compact only sets base URL/key/model and its program calls chat.completions.create without any sampling arguments; compact evals that configure max_tokens, temperature, top_p, etc. will silently run with provider defaults instead of the eval config.

Useful? React with 👍 / 👎.

macroscopeapp · 2026-06-12T20:27:29Z

+                prompt.append(SystemMessage(content=parse_content(item.get("content"))))
+            else:
+                prompt.append(UserMessage(content=parse_content(item.get("content"))))
+        if run:


🟡 Medium dialects/responses.py:126

When an input item with role "system", "developer", or user-role lacks a "content" key, item.get("content") returns None and passes it to parse_content(). Since isinstance(None, str) is false, the code falls through to for part in content:, which raises TypeError: 'NoneType' object is not iterable. Consider adding a default empty string or list to item.get("content") before calling parse_content(), or handle None inside parse_content().

- elif item.get("role") in ("system", "developer"): - prompt.append(SystemMessage(content=parse_content(item.get("content")))) + elif item.get("role") in ("system", "developer"): + prompt.append(SystemMessage(content=parse_content(item.get("content") or ""))) else: - prompt.append(UserMessage(content=parse_content(item.get("content")))) + prompt.append(UserMessage(content=parse_content(item.get("content") or "")))

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file @verifiers/v1/dialects/responses.py around lines 126-129: When an input item with `role` `"system"`, `"developer"`, or user-role lacks a `"content"` key, `item.get("content")` returns `None` and passes it to `parse_content()`. Since `isinstance(None, str)` is false, the code falls through to `for part in content:`, which raises `TypeError: 'NoneType' object is not iterable`. Consider adding a default empty string or list to `item.get("content")` before calling `parse_content()`, or handle `None` inside `parse_content()`. Evidence trail: verifiers/v1/dialects/responses.py lines 36-55 (parse_content definition, no None guard), lines 102-128 (caller passes item.get('content') which can be None)

macroscopeapp · 2026-06-12T20:27:29Z

+    async def handle_aux(
+        self, request: web.Request, dialect: Dialect, route: str
+    ) -> web.Response:
+        """A side endpoint that is not a model turn (e.g. Anthropic count_tokens):
+        relayed verbatim when the client speaks the dialect, answered locally otherwise.
+        Never recorded on the trace."""
+        session = self.session_for(request, dialect)
+        if session is None:
+            return web.json_response(dialect.error_body("unauthorized"), status=401)
+        raw = await request.read()
+        if session.ctx.client.dialect == dialect.name:
+            try:
+                reply = await session.ctx.client.relay(raw, route)
+            except Exception as e:
+                return web.json_response(dialect.error_body(str(e)), status=502)
+            data = b"".join([chunk async for chunk in reply.chunks])
+            return web.Response(body=data, content_type=reply.content_type)
+        return web.json_response(dialect.handle_aux(route, json.loads(raw)))
+
+    async def handle_model(
+        self, request: web.Request, dialect: Dialect
+    ) -> web.StreamResponse:
+        session = self.session_for(request, dialect)


🟢 Low interception/server.py:185

handle_aux calls json.loads(raw) on line 202 without exception handling. If the request body is empty or malformed, this raises json.JSONDecodeError and propagates as a 500 Internal Server Error instead of 400 Bad Request. This leaves the non-relay path unprotected while the relay path (lines 196-201) already catches client errors, creating inconsistent error handling.

- return web.json_response(dialect.handle_aux(route, json.loads(raw))) + try: + body = json.loads(raw) + except json.JSONDecodeError as e: + return web.json_response(dialect.error_body(str(e)), status=400) + return web.json_response(dialect.handle_aux(route, body))

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file @verifiers/v1/interception/server.py around lines 185-207: `handle_aux` calls `json.loads(raw)` on line 202 without exception handling. If the request body is empty or malformed, this raises `json.JSONDecodeError` and propagates as a 500 Internal Server Error instead of 400 Bad Request. This leaves the non-relay path unprotected while the relay path (lines 196-201) already catches client errors, creating inconsistent error handling. Evidence trail: verifiers/v1/interception/server.py lines 185-202 (REVIEWED_COMMIT): `handle_aux` method, line 202 has `json.loads(raw)` without try/except. Lines 195-201 show the relay path with try/except around `session.ctx.client.relay(raw, route)`. Line 163: `web.Application(client_max_size=_MAX_REQUEST_BODY)` with no middleware parameter. git_grep for 'middleware|error_middleware' in the interception package returned no results, confirming no error-handling middleware exists.

macroscopeapp · 2026-06-12T20:29:26Z

Approvability

Verdict: Needs human review

1 blocking correctness issue found. This PR introduces a significant new feature (dialect-routed interception) with ~1000+ lines of new code, breaking changes to endpoint contracts, and major runtime behavior changes to the interception server. Multiple unresolved review comments identify functional bugs including compact harness dropping eval sampling on the relay path.

^{You can customize Macroscope's approvability policy. Learn more.}

Add streaming support to the eval (relay) client + interception server, ported from #1657's relay arm and adapted to our relay-only design: a `stream: true` request is relayed to the provider with `EvalClient.relay` (httpx streaming) and the SSE bytes are piped to the program chunk-by-chunk via an aiohttp `StreamResponse`, while the server accumulates them and `dialect.parse_stream` assembles the final message to record the turn on the trace. Streamed turns are single-shot (no user-sim). Dialect gains the streaming primitives: `parse_stream` (assemble SSE -> vf `Response`), `streaming(body)`, `secret(headers)` (auth carrier — Bearer default, so a non-Bearer dialect like Anthropic can read `x-api-key`), `error_body` (per-format error shape), `aux_routes`, and the shared `iter_sse` helper. The renderer (train) doesn't stream — its `relay` raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add the `responses` dialect (`/v1/responses`), so a codex-style program speaking the OpenAI Responses API can be evaluated through the eval relay. Ports #1657's request walk (fold each run of assistant-side `input` items — reasoning / message / function_call — into one typed assistant message) and adds, for our design: `response_from_wire` (the `output` items -> a vf `Response`, written from the SDK model since we have no native Responses client), `parse_stream` (the terminal `response.completed`/`.incomplete`/`.failed` event carries the full object), and `apply_overrides` mapping the eval's sampling into the Responses shape (`max_output_tokens`). Relay-only: no serializers, no native client, no `previous_response_id` emulation (the endpoint owns server-side state). Registered in `DIALECTS` alongside chat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add the `anthropic` dialect (`/v1/messages`), so a claude-code-style program speaking the Anthropic Messages API can be evaluated through the eval relay. Ports #1657's request parse (system + content blocks -> typed messages; tool_use -> tool calls, tool_result -> tool messages) and streaming assembly (message_start / content_block_* / message_delta), and adds for our design: `response_from_wire` (Anthropic `Message` content blocks -> a vf `Response`, written from the SDK model), `x-api-key` auth + secret carrier, the Anthropic error shape, and `apply_overrides` that keeps the program's required `max_tokens` unless the eval sets one. Also add aux-route relay: a dialect's `aux_routes` (Anthropic's `count_tokens`) are served by the interception server and relayed verbatim to the provider (`Client.relay_aux`), never recorded on the trace. `anthropic` was already a dependency (the v0 client uses it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ream, fix relay aclose - Drop the `_TIMEOUT`/`_LIMITS` constants; use `httpx.AsyncClient(timeout=None)` (agentic completions are slow and the rollout timeout is the real backstop) — matches the relay clients in #1657. - `get_response` now uses `_upstream` too, so it's genuinely shared with `relay` (was single-use). - `relay`: close the streaming response in a `finally` so a failed `aread()` on an error status doesn't leak the connection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ic, streaming), reasoning preserved (#1654) * fix(v1): preserve reasoning_content through the interception proxy vf parses model messages into typed Messages and re-serializes them, but dropped reasoning_content at every hop: the chat client's message_to_wire (egress to the model), the interception server's parse_message (ingress from the harness), and serialize_completion (the completion returned to the harness). So a harness could never carry a turn's reasoning into the next request. Reasoning models require the prior turns' reasoning_content sent back as a message-level field on assistant messages: DeepSeek V4 returns 400 "the reasoning_content in the thinking mode must be passed back to the API" when it's stripped, and Kimi K2 Thinking needs it for multi-turn tool calling. The renderer client already re-emitted it (the training chat template renders it back in); the chat client and the proxy did not -- an eval/train mismatch. Carry reasoning_content through all three hops (omitted when absent, so non- reasoning providers are unaffected) and fold the renderer's duplicate logic into the single chat-client message_to_wire, so the proxy is a clean pass-through that parses into its own types without losing fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): forward eval requests 1:1 via a raw proxy client For the eval path, the chat client now forwards the program's request body verbatim to the provider (Client.proxy) instead of doing a typed re-serialize, so provider fields the typed wire form doesn't model -- e.g. `reasoning` / `reasoning_details` (OpenRouter-style aggregators like PI Inference name reasoning that way, not `reasoning_content`) -- reach the model intact. The interception server uses the typed parse only to build the trace; model + sampling stay eval-controlled, and the user-sim loop extends both the raw wire history and the typed trace. The renderer (training) client keeps the typed path -- it must tokenize the prompt for RL, so it can't be a raw proxy; response_from_wire also learns the `reasoning` field so the trace captures reasoning on either path. Verified: terminal-bench fix-git (rlm harness, modal) on deepseek-v4-flash (reward 1.0) and z-ai/glm-4.7 (clean 81-turn agentic run, 27 assistant turns with reasoning captured, 0 model-call failures) -- reasoning round-trips end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): make the chat client a proxy whose get_response forwards 1:1 Collapse the separate proxy method + supports_proxy flag into one path: the default client (now ProxyClient) implements get_response by forwarding the program's request body 1:1 to the provider; the renderer implements the same signature by tokenizing the typed prompt. The interception server calls get_response uniformly (no branch) and gets back (completion_dict, Response) -- the dict to hand the program, the typed Response for the trace. serialize_completion moves to clients.openai (the renderer builds its dict there; the proxy returns the provider's raw dict). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): rename clients.openai module -> clients.proxy The module's client is now ProxyClient (forwards 1:1), so name the module for what it is. Pure rename + import updates (config, clients __init__, renderer, the interception server, the default harness); also tidies the module docstring and shortens the get_response docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): typed per-format Dialect for wire <-> vf translation Introduce `Dialect[ReqT, RespT]` (verifiers/v1/clients/dialects.py): a generic, per-native-format translator (wire -> vf only; the proxy relays the raw response verbatim, so there is no vf -> wire). `ChatCompletionsDialect` is the only dialect today; OpenAI Responses / Anthropic Messages become drop-in `Dialect`s. A harness declares which dialect its program speaks via `Harness.DIALECT` (classvar, defaults to chat completions) — no auto-detection (a follow-up, for harnesses with several native clients). The rollout threads it onto `RolloutSession.dialect`. `Client.get_response` now returns a single `Response` everywhere (was a `(completion_dict, Response)` tuple): the proxy parses the provider response via the dialect and carries the verbatim bytes on `Response.raw`, so the interception server hands them back to the program 1:1; the renderer leaves `raw` unset and the server serializes its `Response`. Move the chat-completions parsers (`parse_message`, `parse_tools`, `response_from_wire`, `_tokens_from_wire`) out of the proxy/server into the dialect module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): make dialects a package; align reasoning extraction with v0 client Split clients/dialects.py into a clients/dialects/ package: `base` (the `Dialect` ABC) + `oai_chat_completions` (the only dialect today). New dialects (OpenAI Responses / Anthropic Messages) become sibling modules. Mirror the v0 chat client's `parse_reasoning_content`: read the model's reasoning from `reasoning` / `reasoning_content` / `reasoning_details` (same precedence) via the message dict, so the trace captures reasoning regardless of which field the provider uses — not just `reasoning_content`/`reasoning`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): resolve dialect by route; move dialects to verifiers/v1/dialects Auto-detect the wire format from the endpoint a request arrives on instead of pinning it on the harness. Each `Dialect` declares its `routes`; the interception server serves every registered dialect's routes (`dialects.DIALECTS`) and binds the route's dialect to the handler, so a program's SDK selects the format by the path it posts to. Removes `Harness.DIALECT` and `RolloutSession.dialect`. Only OpenAI chat completions is registered today; OpenAI Responses / Anthropic Messages slot in as new dialect modules + routes with no harness or server changes. Also promote dialects from clients/dialects/ to a top-level v1 package (verifiers/v1/dialects/) — it's the wire-translation layer used by both the clients and the interception server, not client-internal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): drop oai_ prefix from the chat_completions dialect module Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): slim get_response to (body, dialect, model, sampling) `prompt`/`tools` were redundant on `get_response` — both are `dialect.parse_request(body)`, so any client holding `body` + `dialect` derives them. Drop them: the renderer parses `body` via the dialect itself; the proxy already only forwarded `body`. Every client now takes the same four args. Also drop `_tokens_from_wire` from the chat dialect — training tokens come from the renderer client, so a chat completion never needs to carry token ids into the trace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): renderer client supports only the chat-completions dialect (errors otherwise) The renderer renders a chat template, so it's only validated for chat-completions input; other dialects' semantics (Responses reasoning items, Anthropic thinking) may not round-trip faithfully through chat-template tokenization. Refuse a non-chat dialect with an informative NotImplementedError rather than silently rendering it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): make Dialect the single place a protocol lives Move every protocol-specific operation onto the `Dialect` so adding a harness with a new wire format is one self-contained module + a `DIALECTS` entry — the proxy and interception server are now fully generic over the interface: - `upstream_path` + `apply_overrides(body, model, sampling)`: the proxy forwards the request byte-exact and the dialect imposes only what the eval owns — model overlays, sampling is authoritative (the program's sampling keys are dropped, the eval's applied), in the protocol's own shape. Fixes the proxy hardcoding `/chat/completions` and the chat-shaped sampling overlay. - `serialize_response` + `extend`: the dialect owns the two vf -> wire cases (the renderer's generated response; user-sim turn injection), so `interception/server.py` has zero wire-format code — it drops the chat-shaped `raw_messages`/`["choices"][0]` handling. - Wire serializers (`message_to_wire`/`tool_to_wire`) live in the chat dialect (the chat-only renderer + default harness import them from there); `model_error` moves to `errors` (its natural home, shared by both clients). `clients/proxy.py` is now just `ProxyClient`. Keep `AsyncOpenAI` as the proxy transport: it's used as a raw poster and handles the OpenAI-compatible endpoint family (base url, Bearer auth, billing headers) every OpenAI-SDK dialect shares; a non-OpenAI provider is a separate transport axis, not a dialect concern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): proxy via raw httpx; dialect owns auth so any provider is just a dialect Swap the ProxyClient's transport from AsyncOpenAI to a raw httpx client. AsyncOpenAI baked in OpenAI auth (Bearer) + base-url/error conventions, which is awkward for a non-OpenAI provider (Anthropic wants x-api-key + anthropic-version). Now the dialect supplies the upstream path + auth headers (`Dialect.auth_headers`, defaulting to Bearer), so the proxy is provider-agnostic and a new wire format — including Anthropic Messages — is just a new `Dialect`, no client change. - Build full upstream URLs ourselves (base_url + dialect.upstream_path) rather than rely on httpx base-url joining (which drops the base path for a leading-slash request path). - Map errors from the response body (httpx's status-error str omits it) so overlong-prompt 400s are still detected; `model_error` accepts the raw body text or an SDK error. - Generous transport defaults mirroring the v0 client (the OpenAI SDK's 600s/100-conn defaults were a bottleneck): 3600s read timeout, 28000 max connections — so one process fans out far more concurrent rollouts. The renderer keeps the OpenAI SDK (it calls a vLLM generate engine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): order get_response as (dialect, body, model, sampling) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): drop serialize_response from Dialect; renderer sets Response.raw `serialize_response` was only ever reached for the renderer (the proxy always sets `Response.raw`), and the renderer is chat-only — so it was dead weight on every other dialect's interface. Instead the renderer serializes its own program-facing completion onto `Response.raw` (via the chat dialect's `serialize_completion`), so the interception server just returns `response.raw` for every client — no dialect method, no server branch. The dialect interface is now only what the generic proxy path needs for every format (routes, upstream_path, auth_headers, parse_request, parse_response, apply_overrides, extend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): name clients by role (EvalClient/TrainClient) + chat dialect module ProxyClient -> EvalClient (clients/eval.py), RendererClient -> TrainClient (clients/train.py), and clients/dialects/chat_completions.py -> chat.py (ChatCompletionsDialect -> ChatDialect) — role-expressive names. Config classes (OpenAIClientConfig/RendererClientConfig) and the --client.type discriminator are unchanged (prime-rl imports them). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): streaming (SSE) relay on the eval path Add streaming support to the eval (relay) client + interception server, ported from #1657's relay arm and adapted to our relay-only design: a `stream: true` request is relayed to the provider with `EvalClient.relay` (httpx streaming) and the SSE bytes are piped to the program chunk-by-chunk via an aiohttp `StreamResponse`, while the server accumulates them and `dialect.parse_stream` assembles the final message to record the turn on the trace. Streamed turns are single-shot (no user-sim). Dialect gains the streaming primitives: `parse_stream` (assemble SSE -> vf `Response`), `streaming(body)`, `secret(headers)` (auth carrier — Bearer default, so a non-Bearer dialect like Anthropic can read `x-api-key`), `error_body` (per-format error shape), `aux_routes`, and the shared `iter_sse` helper. The renderer (train) doesn't stream — its `relay` raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): OpenAI Responses dialect (relay-only) Add the `responses` dialect (`/v1/responses`), so a codex-style program speaking the OpenAI Responses API can be evaluated through the eval relay. Ports #1657's request walk (fold each run of assistant-side `input` items — reasoning / message / function_call — into one typed assistant message) and adds, for our design: `response_from_wire` (the `output` items -> a vf `Response`, written from the SDK model since we have no native Responses client), `parse_stream` (the terminal `response.completed`/`.incomplete`/`.failed` event carries the full object), and `apply_overrides` mapping the eval's sampling into the Responses shape (`max_output_tokens`). Relay-only: no serializers, no native client, no `previous_response_id` emulation (the endpoint owns server-side state). Registered in `DIALECTS` alongside chat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): Anthropic Messages dialect (relay-only) + aux-route relay Add the `anthropic` dialect (`/v1/messages`), so a claude-code-style program speaking the Anthropic Messages API can be evaluated through the eval relay. Ports #1657's request parse (system + content blocks -> typed messages; tool_use -> tool calls, tool_result -> tool messages) and streaming assembly (message_start / content_block_* / message_delta), and adds for our design: `response_from_wire` (Anthropic `Message` content blocks -> a vf `Response`, written from the SDK model), `x-api-key` auth + secret carrier, the Anthropic error shape, and `apply_overrides` that keeps the program's required `max_tokens` unless the eval sets one. Also add aux-route relay: a dialect's `aux_routes` (Anthropic's `count_tokens`) are served by the interception server and relayed verbatim to the provider (`Client.relay_aux`), never recorded on the trace. `anthropic` was already a dependency (the v0 client uses it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1)!: rename client configs to roles — EvalClientConfig/TrainClientConfig Match the config classes + the `--client.type` discriminator to the client roles: `OpenAIClientConfig` -> `EvalClientConfig` (type `openai` -> `eval`), `RendererClientConfig` -> `TrainClientConfig` (type `renderers` -> `train`). The eval client relays any dialect (no longer OpenAI-specific), so `openai` was misleading; `eval`/`train` say what they select. BREAKING: `--client.type renderers` is now `--client.type train` (default `openai` -> `eval`); the config classes are renamed. prime-rl (which imports them) is updated in lockstep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): simplify EvalClient transport — no timeout, share _upstream, fix relay aclose - Drop the `_TIMEOUT`/`_LIMITS` constants; use `httpx.AsyncClient(timeout=None)` (agentic completions are slow and the rollout timeout is the real backstop) — matches the relay clients in #1657. - `get_response` now uses `_upstream` too, so it's genuinely shared with `relay` (was single-use). - `relay`: close the streaming response in a `finally` so a failed `aread()` on an error status doesn't leak the connection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): chat apply_overrides is a plain overlay (drop _SAMPLING_KEYS) The eval owns model + the sampling knobs it sets; a dict overlay (`{**body, model, **sampling}`, later keys win) does exactly that. The stripped-key set only mattered for two edge cases — forcing provider defaults when the eval's sampling is partial, and the max_completion_tokens alias collision — not worth the indirection for the chat dialect. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): move train-only wire serializers (tool_to_wire/serialize_completion) into TrainClient They're only used by the renderer (build its generate request / set Response.raw), so they live with the train client rather than the chat dialect. `message_to_wire` stays in the dialect — it's also used by `extend` (user-sim over relay) and the default harness, and moving it would be a circular import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style(v1): ruff format (anthropic dialect) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor Bot reviewed Jun 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

macroscopeapp Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v1): dialect-routed interception — byte relay on matching protocols, typed translate otherwise#1657

feat(v1): dialect-routed interception — byte relay on matching protocols, typed translate otherwise#1657
xeophon wants to merge 1 commit into
codex/v1-native-provider-clientsfrom
codex/v1-dialect-relay

xeophon commented Jun 12, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Uh oh!

macroscopeapp Bot Jun 12, 2026

Uh oh!

macroscopeapp Bot Jun 12, 2026

Uh oh!

macroscopeapp Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xeophon commented Jun 12, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new

Verification

Design choices reviewers may want to veto

Breaking

Follow-up (out of scope)

Add dialect-routed interception server supporting Anthropic, OpenAI Chat, and OpenAI Responses

🗂️ Filtered Issues

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Compact harness drops eval sampling

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Missing docs for breaking changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot commented Jun 12, 2026

Approvability

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented Jun 12, 2026 •

edited by macroscopeapp Bot

Loading