feat(v1): dialect-routed interception — byte relay on matching protocols, typed translate otherwise#1657
Conversation
…ols, typed translate otherwise The interception server now serves every registered wire dialect's routes (chat completions, Anthropic Messages, OpenAI Responses), resolving a request's format from the endpoint the program's SDK posts to. When the rollout's client natively speaks the request's dialect, the request/response bytes are relayed verbatim (incl. SSE pass-through) and parsed only to record the trace; otherwise the request is translated through the typed middle and the response is serialized back in the program's native format (the training path via renderer/chat-vLLM). Each dialect owns its full codec (parse_request/parse_response/parse_stream + serialize_response/serialize_stream + auth carrier); clients keep the typed get_response and gain dialect + relay(). Harnesses receive the server's root endpoint and append their SDK's suffix; the default harness carries the eval's sampling via OPENAI_SAMPLING since relay no longer injects it. Amp-Thread-ID: https://ampcode.com/threads/T-019ebd29-3076-77b1-a20e-71177db9f1e3 Co-authored-by: Amp <amp@ampcode.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 302dcee. Configure here.
| ) -> ProgramResult: | ||
| env = { | ||
| "OPENAI_BASE_URL": endpoint, | ||
| "OPENAI_BASE_URL": f"{endpoint}/v1", |
There was a problem hiding this comment.
Compact harness drops eval sampling
Medium Severity
On the relay path the interception server forwards request bodies unchanged, so eval ctx.sampling must reach the program via env (as the default harness does with OPENAI_SAMPLING and extra_body). The compact harness and program were not updated, so temperature, max tokens, and other sampling settings from the eval are omitted from compact harness model calls when the client uses the chat dialect.
Reviewed by Cursor Bugbot for commit 302dcee. Configure here.
| `endpoint` — the server's *root*, serving every registered dialect's routes, so | ||
| each harness appends what its program's SDK expects (`{endpoint}/v1` for an | ||
| OpenAI base URL, `endpoint` itself for an Anthropic one) and authenticates with | ||
| `secret` (bearer token / api key); `mcp_urls` are the task's tool servers |
There was a problem hiding this comment.
Missing docs for breaking changes
Medium Severity
This PR changes core v1 interception and harness contracts (endpoint root vs /v1, relay vs translate, moved parse_message / serialize_completion, new Client.dialect / relay), but the diff includes no updates under docs/ or affected skills/ for evaluation and harness setup.
Triggered by project rule: BugBot Instructions
Reviewed by Cursor Bugbot for commit 302dcee. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 302dceeff8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) -> ProgramResult: | ||
| env = { | ||
| "OPENAI_BASE_URL": endpoint, | ||
| "OPENAI_BASE_URL": f"{endpoint}/v1", |
There was a problem hiding this comment.
Preserve sampling for compact harness requests
When this harness points the OpenAI SDK at the new /v1 interception endpoint, a matching chat client now takes the relay path, so InterceptionServer no longer injects ctx.sampling into the model call. The default harness compensates by passing OPENAI_SAMPLING, but compact only sets base URL/key/model and its program calls chat.completions.create without any sampling arguments; compact evals that configure max_tokens, temperature, top_p, etc. will silently run with provider defaults instead of the eval config.
Useful? React with 👍 / 👎.
| prompt.append(SystemMessage(content=parse_content(item.get("content")))) | ||
| else: | ||
| prompt.append(UserMessage(content=parse_content(item.get("content")))) | ||
| if run: |
There was a problem hiding this comment.
🟡 Medium dialects/responses.py:126
When an input item with role "system", "developer", or user-role lacks a "content" key, item.get("content") returns None and passes it to parse_content(). Since isinstance(None, str) is false, the code falls through to for part in content:, which raises TypeError: 'NoneType' object is not iterable. Consider adding a default empty string or list to item.get("content") before calling parse_content(), or handle None inside parse_content().
- elif item.get("role") in ("system", "developer"):
- prompt.append(SystemMessage(content=parse_content(item.get("content"))))
+ elif item.get("role") in ("system", "developer"):
+ prompt.append(SystemMessage(content=parse_content(item.get("content") or "")))
else:
- prompt.append(UserMessage(content=parse_content(item.get("content"))))
+ prompt.append(UserMessage(content=parse_content(item.get("content") or "")))🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/dialects/responses.py around lines 126-129:
When an input item with `role` `"system"`, `"developer"`, or user-role lacks a `"content"` key, `item.get("content")` returns `None` and passes it to `parse_content()`. Since `isinstance(None, str)` is false, the code falls through to `for part in content:`, which raises `TypeError: 'NoneType' object is not iterable`. Consider adding a default empty string or list to `item.get("content")` before calling `parse_content()`, or handle `None` inside `parse_content()`.
Evidence trail:
verifiers/v1/dialects/responses.py lines 36-55 (parse_content definition, no None guard), lines 102-128 (caller passes item.get('content') which can be None)
| async def handle_aux( | ||
| self, request: web.Request, dialect: Dialect, route: str | ||
| ) -> web.Response: | ||
| """A side endpoint that is not a model turn (e.g. Anthropic count_tokens): | ||
| relayed verbatim when the client speaks the dialect, answered locally otherwise. | ||
| Never recorded on the trace.""" | ||
| session = self.session_for(request, dialect) | ||
| if session is None: | ||
| return web.json_response(dialect.error_body("unauthorized"), status=401) | ||
| raw = await request.read() | ||
| if session.ctx.client.dialect == dialect.name: | ||
| try: | ||
| reply = await session.ctx.client.relay(raw, route) | ||
| except Exception as e: | ||
| return web.json_response(dialect.error_body(str(e)), status=502) | ||
| data = b"".join([chunk async for chunk in reply.chunks]) | ||
| return web.Response(body=data, content_type=reply.content_type) | ||
| return web.json_response(dialect.handle_aux(route, json.loads(raw))) | ||
|
|
||
| async def handle_model( | ||
| self, request: web.Request, dialect: Dialect | ||
| ) -> web.StreamResponse: | ||
| session = self.session_for(request, dialect) |
There was a problem hiding this comment.
🟢 Low interception/server.py:185
handle_aux calls json.loads(raw) on line 202 without exception handling. If the request body is empty or malformed, this raises json.JSONDecodeError and propagates as a 500 Internal Server Error instead of 400 Bad Request. This leaves the non-relay path unprotected while the relay path (lines 196-201) already catches client errors, creating inconsistent error handling.
- return web.json_response(dialect.handle_aux(route, json.loads(raw)))
+ try:
+ body = json.loads(raw)
+ except json.JSONDecodeError as e:
+ return web.json_response(dialect.error_body(str(e)), status=400)
+ return web.json_response(dialect.handle_aux(route, body))🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/interception/server.py around lines 185-207:
`handle_aux` calls `json.loads(raw)` on line 202 without exception handling. If the request body is empty or malformed, this raises `json.JSONDecodeError` and propagates as a 500 Internal Server Error instead of 400 Bad Request. This leaves the non-relay path unprotected while the relay path (lines 196-201) already catches client errors, creating inconsistent error handling.
Evidence trail:
verifiers/v1/interception/server.py lines 185-202 (REVIEWED_COMMIT): `handle_aux` method, line 202 has `json.loads(raw)` without try/except. Lines 195-201 show the relay path with try/except around `session.ctx.client.relay(raw, route)`. Line 163: `web.Application(client_max_size=_MAX_REQUEST_BODY)` with no middleware parameter. git_grep for 'middleware|error_middleware' in the interception package returned no results, confirming no error-handling middleware exists.
ApprovabilityVerdict: Needs human review 1 blocking correctness issue found. This PR introduces a significant new feature (dialect-routed interception) with ~1000+ lines of new code, breaking changes to endpoint contracts, and major runtime behavior changes to the interception server. Multiple unresolved review comments identify functional bugs including compact harness dropping eval sampling on the relay path. You can customize Macroscope's approvability policy. Learn more. |
Add streaming support to the eval (relay) client + interception server, ported from #1657's relay arm and adapted to our relay-only design: a `stream: true` request is relayed to the provider with `EvalClient.relay` (httpx streaming) and the SSE bytes are piped to the program chunk-by-chunk via an aiohttp `StreamResponse`, while the server accumulates them and `dialect.parse_stream` assembles the final message to record the turn on the trace. Streamed turns are single-shot (no user-sim). Dialect gains the streaming primitives: `parse_stream` (assemble SSE -> vf `Response`), `streaming(body)`, `secret(headers)` (auth carrier — Bearer default, so a non-Bearer dialect like Anthropic can read `x-api-key`), `error_body` (per-format error shape), `aux_routes`, and the shared `iter_sse` helper. The renderer (train) doesn't stream — its `relay` raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the `responses` dialect (`/v1/responses`), so a codex-style program speaking the OpenAI Responses API can be evaluated through the eval relay. Ports #1657's request walk (fold each run of assistant-side `input` items — reasoning / message / function_call — into one typed assistant message) and adds, for our design: `response_from_wire` (the `output` items -> a vf `Response`, written from the SDK model since we have no native Responses client), `parse_stream` (the terminal `response.completed`/`.incomplete`/`.failed` event carries the full object), and `apply_overrides` mapping the eval's sampling into the Responses shape (`max_output_tokens`). Relay-only: no serializers, no native client, no `previous_response_id` emulation (the endpoint owns server-side state). Registered in `DIALECTS` alongside chat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the `anthropic` dialect (`/v1/messages`), so a claude-code-style program speaking the Anthropic Messages API can be evaluated through the eval relay. Ports #1657's request parse (system + content blocks -> typed messages; tool_use -> tool calls, tool_result -> tool messages) and streaming assembly (message_start / content_block_* / message_delta), and adds for our design: `response_from_wire` (Anthropic `Message` content blocks -> a vf `Response`, written from the SDK model), `x-api-key` auth + secret carrier, the Anthropic error shape, and `apply_overrides` that keeps the program's required `max_tokens` unless the eval sets one. Also add aux-route relay: a dialect's `aux_routes` (Anthropic's `count_tokens`) are served by the interception server and relayed verbatim to the provider (`Client.relay_aux`), never recorded on the trace. `anthropic` was already a dependency (the v0 client uses it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ream, fix relay aclose - Drop the `_TIMEOUT`/`_LIMITS` constants; use `httpx.AsyncClient(timeout=None)` (agentic completions are slow and the rollout timeout is the real backstop) — matches the relay clients in #1657. - `get_response` now uses `_upstream` too, so it's genuinely shared with `relay` (was single-use). - `relay`: close the streaming response in a `finally` so a failed `aread()` on an error status doesn't leak the connection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ic, streaming), reasoning preserved (#1654) * fix(v1): preserve reasoning_content through the interception proxy vf parses model messages into typed Messages and re-serializes them, but dropped reasoning_content at every hop: the chat client's message_to_wire (egress to the model), the interception server's parse_message (ingress from the harness), and serialize_completion (the completion returned to the harness). So a harness could never carry a turn's reasoning into the next request. Reasoning models require the prior turns' reasoning_content sent back as a message-level field on assistant messages: DeepSeek V4 returns 400 "the reasoning_content in the thinking mode must be passed back to the API" when it's stripped, and Kimi K2 Thinking needs it for multi-turn tool calling. The renderer client already re-emitted it (the training chat template renders it back in); the chat client and the proxy did not -- an eval/train mismatch. Carry reasoning_content through all three hops (omitted when absent, so non- reasoning providers are unaffected) and fold the renderer's duplicate logic into the single chat-client message_to_wire, so the proxy is a clean pass-through that parses into its own types without losing fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): forward eval requests 1:1 via a raw proxy client For the eval path, the chat client now forwards the program's request body verbatim to the provider (Client.proxy) instead of doing a typed re-serialize, so provider fields the typed wire form doesn't model -- e.g. `reasoning` / `reasoning_details` (OpenRouter-style aggregators like PI Inference name reasoning that way, not `reasoning_content`) -- reach the model intact. The interception server uses the typed parse only to build the trace; model + sampling stay eval-controlled, and the user-sim loop extends both the raw wire history and the typed trace. The renderer (training) client keeps the typed path -- it must tokenize the prompt for RL, so it can't be a raw proxy; response_from_wire also learns the `reasoning` field so the trace captures reasoning on either path. Verified: terminal-bench fix-git (rlm harness, modal) on deepseek-v4-flash (reward 1.0) and z-ai/glm-4.7 (clean 81-turn agentic run, 27 assistant turns with reasoning captured, 0 model-call failures) -- reasoning round-trips end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): make the chat client a proxy whose get_response forwards 1:1 Collapse the separate proxy method + supports_proxy flag into one path: the default client (now ProxyClient) implements get_response by forwarding the program's request body 1:1 to the provider; the renderer implements the same signature by tokenizing the typed prompt. The interception server calls get_response uniformly (no branch) and gets back (completion_dict, Response) -- the dict to hand the program, the typed Response for the trace. serialize_completion moves to clients.openai (the renderer builds its dict there; the proxy returns the provider's raw dict). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): rename clients.openai module -> clients.proxy The module's client is now ProxyClient (forwards 1:1), so name the module for what it is. Pure rename + import updates (config, clients __init__, renderer, the interception server, the default harness); also tidies the module docstring and shortens the get_response docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): typed per-format Dialect for wire <-> vf translation Introduce `Dialect[ReqT, RespT]` (verifiers/v1/clients/dialects.py): a generic, per-native-format translator (wire -> vf only; the proxy relays the raw response verbatim, so there is no vf -> wire). `ChatCompletionsDialect` is the only dialect today; OpenAI Responses / Anthropic Messages become drop-in `Dialect`s. A harness declares which dialect its program speaks via `Harness.DIALECT` (classvar, defaults to chat completions) — no auto-detection (a follow-up, for harnesses with several native clients). The rollout threads it onto `RolloutSession.dialect`. `Client.get_response` now returns a single `Response` everywhere (was a `(completion_dict, Response)` tuple): the proxy parses the provider response via the dialect and carries the verbatim bytes on `Response.raw`, so the interception server hands them back to the program 1:1; the renderer leaves `raw` unset and the server serializes its `Response`. Move the chat-completions parsers (`parse_message`, `parse_tools`, `response_from_wire`, `_tokens_from_wire`) out of the proxy/server into the dialect module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): make dialects a package; align reasoning extraction with v0 client Split clients/dialects.py into a clients/dialects/ package: `base` (the `Dialect` ABC) + `oai_chat_completions` (the only dialect today). New dialects (OpenAI Responses / Anthropic Messages) become sibling modules. Mirror the v0 chat client's `parse_reasoning_content`: read the model's reasoning from `reasoning` / `reasoning_content` / `reasoning_details` (same precedence) via the message dict, so the trace captures reasoning regardless of which field the provider uses — not just `reasoning_content`/`reasoning`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): resolve dialect by route; move dialects to verifiers/v1/dialects Auto-detect the wire format from the endpoint a request arrives on instead of pinning it on the harness. Each `Dialect` declares its `routes`; the interception server serves every registered dialect's routes (`dialects.DIALECTS`) and binds the route's dialect to the handler, so a program's SDK selects the format by the path it posts to. Removes `Harness.DIALECT` and `RolloutSession.dialect`. Only OpenAI chat completions is registered today; OpenAI Responses / Anthropic Messages slot in as new dialect modules + routes with no harness or server changes. Also promote dialects from clients/dialects/ to a top-level v1 package (verifiers/v1/dialects/) — it's the wire-translation layer used by both the clients and the interception server, not client-internal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): drop oai_ prefix from the chat_completions dialect module Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): slim get_response to (body, dialect, model, sampling) `prompt`/`tools` were redundant on `get_response` — both are `dialect.parse_request(body)`, so any client holding `body` + `dialect` derives them. Drop them: the renderer parses `body` via the dialect itself; the proxy already only forwarded `body`. Every client now takes the same four args. Also drop `_tokens_from_wire` from the chat dialect — training tokens come from the renderer client, so a chat completion never needs to carry token ids into the trace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): renderer client supports only the chat-completions dialect (errors otherwise) The renderer renders a chat template, so it's only validated for chat-completions input; other dialects' semantics (Responses reasoning items, Anthropic thinking) may not round-trip faithfully through chat-template tokenization. Refuse a non-chat dialect with an informative NotImplementedError rather than silently rendering it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): make Dialect the single place a protocol lives Move every protocol-specific operation onto the `Dialect` so adding a harness with a new wire format is one self-contained module + a `DIALECTS` entry — the proxy and interception server are now fully generic over the interface: - `upstream_path` + `apply_overrides(body, model, sampling)`: the proxy forwards the request byte-exact and the dialect imposes only what the eval owns — model overlays, sampling is authoritative (the program's sampling keys are dropped, the eval's applied), in the protocol's own shape. Fixes the proxy hardcoding `/chat/completions` and the chat-shaped sampling overlay. - `serialize_response` + `extend`: the dialect owns the two vf -> wire cases (the renderer's generated response; user-sim turn injection), so `interception/server.py` has zero wire-format code — it drops the chat-shaped `raw_messages`/`["choices"][0]` handling. - Wire serializers (`message_to_wire`/`tool_to_wire`) live in the chat dialect (the chat-only renderer + default harness import them from there); `model_error` moves to `errors` (its natural home, shared by both clients). `clients/proxy.py` is now just `ProxyClient`. Keep `AsyncOpenAI` as the proxy transport: it's used as a raw poster and handles the OpenAI-compatible endpoint family (base url, Bearer auth, billing headers) every OpenAI-SDK dialect shares; a non-OpenAI provider is a separate transport axis, not a dialect concern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): proxy via raw httpx; dialect owns auth so any provider is just a dialect Swap the ProxyClient's transport from AsyncOpenAI to a raw httpx client. AsyncOpenAI baked in OpenAI auth (Bearer) + base-url/error conventions, which is awkward for a non-OpenAI provider (Anthropic wants x-api-key + anthropic-version). Now the dialect supplies the upstream path + auth headers (`Dialect.auth_headers`, defaulting to Bearer), so the proxy is provider-agnostic and a new wire format — including Anthropic Messages — is just a new `Dialect`, no client change. - Build full upstream URLs ourselves (base_url + dialect.upstream_path) rather than rely on httpx base-url joining (which drops the base path for a leading-slash request path). - Map errors from the response body (httpx's status-error str omits it) so overlong-prompt 400s are still detected; `model_error` accepts the raw body text or an SDK error. - Generous transport defaults mirroring the v0 client (the OpenAI SDK's 600s/100-conn defaults were a bottleneck): 3600s read timeout, 28000 max connections — so one process fans out far more concurrent rollouts. The renderer keeps the OpenAI SDK (it calls a vLLM generate engine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): order get_response as (dialect, body, model, sampling) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): drop serialize_response from Dialect; renderer sets Response.raw `serialize_response` was only ever reached for the renderer (the proxy always sets `Response.raw`), and the renderer is chat-only — so it was dead weight on every other dialect's interface. Instead the renderer serializes its own program-facing completion onto `Response.raw` (via the chat dialect's `serialize_completion`), so the interception server just returns `response.raw` for every client — no dialect method, no server branch. The dialect interface is now only what the generic proxy path needs for every format (routes, upstream_path, auth_headers, parse_request, parse_response, apply_overrides, extend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): name clients by role (EvalClient/TrainClient) + chat dialect module ProxyClient -> EvalClient (clients/eval.py), RendererClient -> TrainClient (clients/train.py), and clients/dialects/chat_completions.py -> chat.py (ChatCompletionsDialect -> ChatDialect) — role-expressive names. Config classes (OpenAIClientConfig/RendererClientConfig) and the --client.type discriminator are unchanged (prime-rl imports them). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): streaming (SSE) relay on the eval path Add streaming support to the eval (relay) client + interception server, ported from #1657's relay arm and adapted to our relay-only design: a `stream: true` request is relayed to the provider with `EvalClient.relay` (httpx streaming) and the SSE bytes are piped to the program chunk-by-chunk via an aiohttp `StreamResponse`, while the server accumulates them and `dialect.parse_stream` assembles the final message to record the turn on the trace. Streamed turns are single-shot (no user-sim). Dialect gains the streaming primitives: `parse_stream` (assemble SSE -> vf `Response`), `streaming(body)`, `secret(headers)` (auth carrier — Bearer default, so a non-Bearer dialect like Anthropic can read `x-api-key`), `error_body` (per-format error shape), `aux_routes`, and the shared `iter_sse` helper. The renderer (train) doesn't stream — its `relay` raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): OpenAI Responses dialect (relay-only) Add the `responses` dialect (`/v1/responses`), so a codex-style program speaking the OpenAI Responses API can be evaluated through the eval relay. Ports #1657's request walk (fold each run of assistant-side `input` items — reasoning / message / function_call — into one typed assistant message) and adds, for our design: `response_from_wire` (the `output` items -> a vf `Response`, written from the SDK model since we have no native Responses client), `parse_stream` (the terminal `response.completed`/`.incomplete`/`.failed` event carries the full object), and `apply_overrides` mapping the eval's sampling into the Responses shape (`max_output_tokens`). Relay-only: no serializers, no native client, no `previous_response_id` emulation (the endpoint owns server-side state). Registered in `DIALECTS` alongside chat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(v1): Anthropic Messages dialect (relay-only) + aux-route relay Add the `anthropic` dialect (`/v1/messages`), so a claude-code-style program speaking the Anthropic Messages API can be evaluated through the eval relay. Ports #1657's request parse (system + content blocks -> typed messages; tool_use -> tool calls, tool_result -> tool messages) and streaming assembly (message_start / content_block_* / message_delta), and adds for our design: `response_from_wire` (Anthropic `Message` content blocks -> a vf `Response`, written from the SDK model), `x-api-key` auth + secret carrier, the Anthropic error shape, and `apply_overrides` that keeps the program's required `max_tokens` unless the eval sets one. Also add aux-route relay: a dialect's `aux_routes` (Anthropic's `count_tokens`) are served by the interception server and relayed verbatim to the provider (`Client.relay_aux`), never recorded on the trace. `anthropic` was already a dependency (the v0 client uses it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1)!: rename client configs to roles — EvalClientConfig/TrainClientConfig Match the config classes + the `--client.type` discriminator to the client roles: `OpenAIClientConfig` -> `EvalClientConfig` (type `openai` -> `eval`), `RendererClientConfig` -> `TrainClientConfig` (type `renderers` -> `train`). The eval client relays any dialect (no longer OpenAI-specific), so `openai` was misleading; `eval`/`train` say what they select. BREAKING: `--client.type renderers` is now `--client.type train` (default `openai` -> `eval`); the config classes are renamed. prime-rl (which imports them) is updated in lockstep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): simplify EvalClient transport — no timeout, share _upstream, fix relay aclose - Drop the `_TIMEOUT`/`_LIMITS` constants; use `httpx.AsyncClient(timeout=None)` (agentic completions are slow and the rollout timeout is the real backstop) — matches the relay clients in #1657. - `get_response` now uses `_upstream` too, so it's genuinely shared with `relay` (was single-use). - `relay`: close the streaming response in a `finally` so a failed `aread()` on an error status doesn't leak the connection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): chat apply_overrides is a plain overlay (drop _SAMPLING_KEYS) The eval owns model + the sampling knobs it sets; a dict overlay (`{**body, model, **sampling}`, later keys win) does exactly that. The stripped-key set only mattered for two edge cases — forcing provider defaults when the eval's sampling is partial, and the max_completion_tokens alias collision — not worth the indirection for the chat dialect. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(v1): move train-only wire serializers (tool_to_wire/serialize_completion) into TrainClient They're only used by the renderer (build its generate request / set Response.raw), so they live with the train client rather than the chat dialect. `message_to_wire` stays in the dialect — it's also used by `extend` (user-sim over relay) and the default harness, and moving it would be a circular import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style(v1): ruff format (anthropic dialect) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>


Summary
Stacked on #1651 (native provider clients); merges into
feat/nano-as-v1once that lands.This PR combines the two halves discussed in #1654: route-detected wire dialects on ingress (from that PR's design) and the native provider clients on egress (#1651), reconciled by one request-time rule:
Client.dialect == Dialect.name), the program's request bytes are forwarded verbatim and the provider's response (JSON or SSE) is relayed back untouched. The dialect parses a copy only to record the trace. No field is lost to a typed round-trip — not reasoning, notcache_control, not any future provider field.Responseis serialized back in the dialect the program spoke.The selection is a string compare per request: no model→client tables, no
Harness.DIALECT, no body sniffing. The harness only chooses env vars, as before.What's new
verifiers/v1/dialects/— one module per native format (chat,anthropic,responses), each owning its route, auth carrier (Bearer/x-api-key), wire→vf parsers (parse_request/parse_response/parse_stream), and vf→wire serializers (serialize_response/serialize_stream, translate path only). Response parsing reuses the clients'response_from_wire, soprovider_state(thinking blocks, Responses output items,reasoning_details) and vLLM token ids land on the trace on both arms.InterceptionServermounts every dialect's routes; the SDK's URL picks the codec. Streaming relays pass SSE bytes through chunk-by-chunk and tee-parse the assembled final message for the trace; streaming translate fake-streams a minimal valid SSE. Anthropic'scount_tokensis relayed on the relay arm and estimated (~4 chars/token) on translate. Errors come back in each dialect's native error shape.get_response(judges and in-env calls unaffected) and gaindialect+relay(body, route);RetryingClientretries relays the same way (relay raises before any byte is returned). Renderer and Google stay translate-only (dialect = None) — which is exactly how training keeps typed token data.{endpoint}/v1for OpenAI SDKs; bare root forANTHROPIC_BASE_URL). The default harness carries the eval's sampling into the program (OPENAI_SAMPLINGenv →extra_body) since the relay no longer injects it.A claude-code harness now needs only
ANTHROPIC_BASE_URL={endpoint}+ANTHROPIC_API_KEY={secret}: ingress (incl. streaming + count_tokens) is already served, relay applies against any anthropic-speaking endpoint (api.anthropic.com, vLLM's native/v1/messages), and translate covers everything else.Verification
Unit (
tests/v1/test_dialects.py, 17 tests + full v1 suite, 63 passed): codec round-trips validated against the provider SDK models (anthropic.types.Message,openai.types.responses.Response), stream assembly/fake-stream round-trips, byte-verbatim relay assertions (unknown request fields survive), per-dialect auth + error shapes, aux-route relay/estimate, refusals in dialect error shape.Live, through the real eval machinery (default harness × subprocess):
extend_requestover relay)Live testing caught two real relay bugs, both fixed: newer openai/anthropic SDKs exclude auth from
default_headers(now merged fromauth_headersexplicitly), and duplicateContent-Typeheaders made OpenAI reject relayed bodies (header keys now lowercased/deduped).Not live-tested: anthropic/responses ingress (no claude-code/codex harness exists yet — covered by unit tests against the SDK models) and the renderer translate path (needs a vLLM engine).
Design choices reviewers may want to veto
modelor sampling. The model id already reaches the program via harness env; the default harness now carries sampling itself. On rlm's relay path, eval--sampling.*does not apply until rlm reads an equivalent env (translate/training keepsctx.samplingpinned).Dialect.extend_request); on other dialects a simulator works via the translate arm. Streamed turns never drive a simulator.thinkingblock so the program displays it and echoes it back, whereparse_requestrecovers it — the reasoning-passback some models hard-require, carried across protocols through the program itself.previous_response_idignored; the trace graph could emulate it later); on relay the endpoint owns it.count_tokenstranslate-arm estimate is deliberately crude (compaction-trigger fidelity, not billing).Breaking
parse_message/parse_tools/serialize_completionmoved frominterception.servertoverifiers.v1.dialects(chat module).Rollout/InterceptionPoolhand harnesses the endpoint root (no/v1suffix); out-of-tree harnesses must append their SDK's path (in-tree default/rlm/compact updated).Clientgainsdialect/relay; custom clients are unaffected unless they want relay.Follow-up (out of scope)
-m anthropic:claude-…scheme + host claims) so direct-API runs don't need--client.type.previous_response_idemulation from the trace graph.Note
High Risk
Core rollout path changes (interception routing, verbatim upstream relay, endpoint root contract) affect every harness model call; mistakes could break auth, streaming, or cross-protocol evals.
Overview
Adds polyglot interception: the server mounts chat, anthropic, and responses routes and picks relay (verbatim request/response bytes when
Client.dialectmatches) vs translate (parse → typed client → serialize back) per request.New
verifiers/v1/dialects/holds per-format codecs (parse/serialize, streaming SSE, dialect-shaped errors, Anthropiccount_tokensaux). Chat wire helpers move here frominterception.server.Clients gain
dialect+relay()via sharedrelay_post/relay_headers; RetryingClient retries relays. Sharedclassify_model_errorreplaces duplicated context-length checks in OpenAI client code.Harness/rollout expose the interception root (no
/v1suffix); in-tree harnesses setOPENAI_BASE_URLto{endpoint}/v1. Default harness passes eval sampling asOPENAI_SAMPLINGmerged into programextra_body(relay no longer injects sampling).Tests: new
test_dialects.pyfor codecs and relay/translate server behavior;test_clientsimports fromdialects.Reviewed by Cursor Bugbot for commit 302dcee. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add dialect-routed interception server supporting Anthropic, OpenAI Chat, and OpenAI Responses
/v1/messages), OpenAI Chat (/v1/chat/completions), and OpenAI Responses (/v1/responses) concurrently.Dialectabstraction in verifiers/v1/dialects/ with per-dialect parse/serialize, SSE handling, auth extraction, error shaping, and aux route support.relay()toClientand all three client implementations, backed by a sharedrelay_posthelper with retry logic, streaming, and error classification viaclassify_model_error./v1); all harnessOPENAI_BASE_URLvalues updated accordingly.InterceptionPool._entryandRollout._serve_interceptionnow receive root endpoints (no/v1suffix); harnesses must be updated or requests will 404.📊 Macroscope summarized 302dcee. 20 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted
🗂️ Filtered Issues
No issues evaluated.