Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/harnesses/compact/compact/harness.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ async def launch(
mcp_urls: dict[str, str],
) -> ProgramResult:
env = {
"OPENAI_BASE_URL": endpoint,
"OPENAI_BASE_URL": f"{endpoint}/v1",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compact harness drops eval sampling

Medium Severity

On the relay path the interception server forwards request bodies unchanged, so eval ctx.sampling must reach the program via env (as the default harness does with OPENAI_SAMPLING and extra_body). The compact harness and program were not updated, so temperature, max tokens, and other sampling settings from the eval are omitted from compact harness model calls when the client uses the chat dialect.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 302dcee. Configure here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve sampling for compact harness requests

When this harness points the OpenAI SDK at the new /v1 interception endpoint, a matching chat client now takes the relay path, so InterceptionServer no longer injects ctx.sampling into the model call. The default harness compensates by passing OPENAI_SAMPLING, but compact only sets base URL/key/model and its program calls chat.completions.create without any sampling arguments; compact evals that configure max_tokens, temperature, top_p, etc. will silently run with provider defaults instead of the eval config.

Useful? React with 👍 / 👎.

"OPENAI_API_KEY": secret,
"OPENAI_MODEL": ctx.model,
}
Expand Down
7 changes: 6 additions & 1 deletion packages/harnesses/harnesses/default/harness.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,15 @@ async def launch(
system_prompt = "\n\n".join(
p for p in (BASH_SYSTEM_PROMPT, system_prompt) if p
)
# The program owns its request bodies (the interception server relays them as-is),
# so the eval's sampling rides along as env and the program merges it per call.
sampling = ctx.sampling.model_dump(exclude_none=True)
sampling.pop("stream", None) # the program's chat loop never streams
env = {
"OPENAI_BASE_URL": endpoint,
"OPENAI_BASE_URL": f"{endpoint}/v1",
"OPENAI_API_KEY": secret,
"OPENAI_MODEL": ctx.model,
"OPENAI_SAMPLING": json.dumps(sampling),
"ENABLE_BASH": "1" if self.config.enable_bash else "0",
"APPEND_SYSTEM_PROMPT": system_prompt or "",
}
Expand Down
9 changes: 8 additions & 1 deletion packages/harnesses/harnesses/default/program.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@
# base_url + api_key come from OPENAI_BASE_URL / OPENAI_API_KEY.
client = AsyncOpenAI()

# The eval's sampling args, merged into every request body (the interception server
# relays our requests verbatim, so the program carries them itself).
SAMPLING = json.loads(os.environ.get("OPENAI_SAMPLING", "{}"))


def run_bash(command: str) -> str:
try:
Expand All @@ -55,7 +59,10 @@ def run_bash(command: str) -> str:

async def chat(messages: list[dict], tools: list[dict]):
completion = await client.chat.completions.create(
model=os.environ["OPENAI_MODEL"], messages=messages, tools=tools or None
model=os.environ["OPENAI_MODEL"],
messages=messages,
tools=tools or None,
extra_body=SAMPLING, # extra_body: sampling keys go on the wire untyped
)
return completion.choices[0].message

Expand Down
2 changes: 1 addition & 1 deletion packages/harnesses/harnesses/rlm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ async def launch(
# rlm reaches the interception server via OPENAI_BASE_URL/API_KEY (its
# provider precedence falls back to OPENAI_*), and reads RLM_* for itself.
env = {
"OPENAI_BASE_URL": endpoint,
"OPENAI_BASE_URL": f"{endpoint}/v1",
"OPENAI_API_KEY": secret,
"RLM_MODEL": ctx.model,
"RLM_MAX_DEPTH": str(self.config.max_depth),
Expand Down
2 changes: 1 addition & 1 deletion tests/v1/test_clients.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
from verifiers.v1.clients.openai_responses import (
response_from_wire as responses_response,
)
from verifiers.v1.interception.server import parse_message, serialize_completion
from verifiers.v1.dialects import parse_message, serialize_completion
from verifiers.v1.types import content_to_parts


Expand Down
Loading
Loading