Agentic f3dasm CRW June 2026 prototype (NOT TO BE MERGED WITH MAIN) by elvis-aguero · Pull Request #347 · bessagroup/f3dasm

elvis-aguero · 2026-05-22T18:24:18Z

Code Release Week Checklist

This checklist is intended to help reviewers evaluate code repositories during Code Release Week. Reviewers should go through each section and mark the corresponding checkboxes based on their assessment of the repository.

Quality of GitHub repository `README.md` file

Is the content properly structured and does it contain the minimum expected sections: Summary, Statement of need, Authorship, Getting started, Community Support, License?
Is the content grammatically well-written, clear, and easily understandable?
Is the high-level functionality and purpose of the project clear for a diverse, non-specialist audience?
Does it contain a clear statement of need that illustrates the purpose of the project?
Does it contain a set of key references for the user/developer (e.g., paper, documentation, installation, benchmarks, tutorials)?

Project Structure and Files

Code Quality

Documentation Quality

Benchmarks, Testing, and Distribution

Strictly additive layer on top of f3dasm. No native modules edited. Public API: `from f3dasm.agentic import …`. Architecture: two persistent peer Claude Agent SDK sessions (Strategizer, Implementer) routed by a Python orchestrator. Strategizer emits Strategy payloads; Implementer dispatches them via `run_strategy` and may follow up with `ask_strategizer` (bounded). Provenance via `__turn` output column + TurnRecord log; runs end with a `write_deliverable()` folder (solution.md + replicate.py + dataset snapshot + turn_log.jsonl). MVP strategy registry: latin, sobol, random_uniform, grid, local_random. Lookup DataGenerator for pool-backed evaluation. Supercompressible study wired as first test of the generic MVP. 173 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The v1 MVP (registry + AnalysisBase + fixed-iteration loop) was over-engineered and supercompressible-leaky. v2 will be a thin runtime over two persistent Claude SDK sessions with carefully engineered prompts and git-based provenance. Keep LookupDataGenerator as a utility the Implementer may import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two persistent Claude Agent SDK sessions (Strategizer + Implementer) routed by a thin Python runtime. Strategizer reads any file, writes .md notes only, drives the run via Delegate/Ask/Done tools. Implementer receives Task payloads, executes freely (Read/Write/Edit /Bash/Glob/Grep restricted to the study dir), returns structured Report blocks. Provenance via per-delegation git commits to an isolated run repo at runs/<timestamp>/.git/ (study-tree work-tree, no touching of any enclosing user repo). Checkpoint every 30 delegations: Strategizer summarises, user steers, Implementer session is reset. Prompts engineered with SOTA techniques: XML-tagged sections, named failure-mode mitigations (anchoring, confirmation, availability, role drift, sycophancy, premature convergence), non-negotiable briefing- clarification ritual, refusal of hypothesis-verification by the Implementer. User surface: studies/<study>/briefing.md is the only required file. CLI: python -m f3dasm.agentic <study-dir>. 32 tests pass, ruff clean. No native f3dasm edits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… transcripts Four bundled additions to the v2 runtime: 1. BORA-style hypothesis log (Cissé et al. 2025) — Strategizer maintains strategizer_notes/hypotheses.md as a sequence of named ## Comment blocks (statement / confidence / evidence / status / last_updated_delegation). Done() is gated on at least one supported AND one refuted comment — the falsification requirement expressed as a log invariant. 2. Reflexion-style error reflection (Shinn et al. 2023) — failed delegations now return REFLECT: <diagnosis> instead of a static ERROR string. _classify_failed_implementer_response categorises the failure (short, capability-limit, missing-subsections, no-Report, default) and the Strategizer prompt forbids identical-intent re-delegation after REFLECT. 3. SelfAI 3-stage reasoning (Wu et al. 2025) — Implementer prompt now requires three labelled sections before ## Report: Task restatement, Workspace inventory, Execution plan. _parse_report still ignores pre-Report text, no parsing change needed. 4. Per-turn transcript capture for debuggability — JSONL events (user_message / tool_call / tool_result / assistant_text) written under runs/<ts>/transcripts/ and copied to the deliverable folder. read_transcript() helper for post-run inspection. Caveat: the SDK's query() stream does not expose Implementer-internal tool calls through the current code path; we capture net inputs and outputs per delegation, not the full chain. 46 tests pass (32 prior + 14 new), ruff clean. No native f3dasm edits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

1. WriteMarkdown path resolution is now lenient: bare filenames and relative paths under strategizer_notes/ are anchored to the canonical run directory; only path-escape and non-.md extensions are rejected. Strategizer can write hypotheses.md directly without needing to construct the run timestamp. 2. Implementer system prompt now gets a <workspace> preamble at session creation, injecting the absolute workspace path and forbidding /tmp explicitly. Replaces the previous habit of the Implementer scratching in /tmp (which left deliverables empty). 3. One-shot corrective retry on missing ## Report block. When the Implementer's first reply doesn't parse, the runtime sends a focused corrective message restating the required structure; if the retry yields a valid Report the delegation is counted. Two bad replies in a row still fall through to REFLECT and count as a delegation failure. End-to-end verification: new study studies/modular_resonance/ with a PE 952-inspired two-parameter resonance optimization. Forces f3dasm usage explicitly in the briefing. Haiku ran 5 delegations across 4 search phases (2884 evaluations), populated the BORA- style hypothesis log with status transitions including a real falsification, wrote 27 workspace artefacts and a runnable replicate.py. Independent brute-force confirms the agent's answer (k=6, m=99991, resonance=8685.089) is the true global optimum. 47/47 tests pass (added test_one_shot_retry_recovers_invalid_report and test_two_invalid_replies_fall_through_to_reflect, updated test_write_markdown_bad_paths for lenient path semantics). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

README-agentic.md sits at the repo root alongside the upstream README.md without disturbing it. It is the standalone landing document for the agentic-f3dasm layer and covers the six sections required by the BRG checklist (Summary, Statement of need, Authorship, Getting started, Community support, License) plus the four Part-1 presentation sections (Context, Statement of Need, Method overview, Code purpose/audience/goals). docs/agentic/class_diagram.dot + .svg is the class diagram required by Part 2 of the presentation rubric. It shows AgenticRun, the Strategizer/Implementer Protocol pair and their _Claude* implementations, the Task/Report payloads, the tool-closure surface, the LookupDataGenerator utility, and the native f3dasm core (drawn dashed to make explicit that the agentic layer does not edit it). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Neither package is imported anywhere under src/, tests/, studies/, or docs/ — verified by grep for `import langchain*` / `from langchain*` / `import langgraph` / `from langgraph` (zero hits). The only matches were in `src/f3dasm.egg-info/*` (autogenerated from pyproject.toml itself) and in `docs/specs/literature-map.md` (prose discussion of a paper that uses LangGraph). Removing them shrinks the base install for every f3dasm user without affecting the agentic layer or any other behaviour. 120/120 agentic tests still pass; `from f3dasm.agentic import ...` still works. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Honours the original "all prompts as module constants" agreement. Seven prompt strings that had been built inline in agent_runtime.py become named, docstring-attached constants in agent_prompts.py: RUN_PATHS_PREAMBLE_TEMPLATE (study_dir, notes_dir) WORKSPACE_PREAMBLE_TEMPLATE (workspace_dir) IMPLEMENTER_REPORT_RETRY_PROMPT (static) REFLECT_DIAGNOSIS_SHORT (static) REFLECT_DIAGNOSIS_CAPABILITY_LIMIT (static) REFLECT_DIAGNOSIS_MISSING_SUBSECTIONS_TEMPLATE (missing_subsections) REFLECT_DIAGNOSIS_NO_REPORT_HEADING (static) REFLECT_DIAGNOSIS_DEFAULT (static) Runtime call sites updated: _compose_strategizer_prompt → RUN_PATHS_PREAMBLE_TEMPLATE.format(...) _compose_implementer_prompt → WORKSPACE_PREAMBLE_TEMPLATE.format(...) _tool_delegate → IMPLEMENTER_REPORT_RETRY_PROMPT _classify_failed_implementer_response → the five REFLECT constants No behaviour change. 12 new prompt-sanity tests added (total 59 in the prompt+runtime suites; 132 across the agentic surface). Ruff clean. Prompt iteration is now a single-file concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Plain pip install silently falls back to the global Python when no venv is active, which is the wrong default for a project that pins claude-agent-sdk. Switch the docs to uv: uv venv uv pip install -e ".[agentic]" uv run python -m f3dasm.agentic <study-dir> Document the python -m venv + pip fallback for users without uv, but explicitly warn against unactivated pip install. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The CLI used to be silent while the SDK churned for minutes between delegations. Now AgenticRun configures a `f3dasm.agentic` logger with two handlers — a stderr StreamHandler and a FileHandler pointed at `<run_dir>/run.log` — both formatted as [HH:MM:SS] LEVEL message INFO-level events at the milestones the user actually cares about: - run start (with run_dir, model, checkpoint cadence) - briefing read (char count) - sessions started - each delegation start (with first 80 chars of intent) - each delegation end (elapsed M:SS + first 100 chars of conclusions) - Implementer session reset - checkpoint firing (delegation count) - Done received (preview of Strategizer summary) - deliverable assembled (path) WARNING when a delegation falls through to REFLECT after the corrective retry. run.log is copied into deliverable/run.log at the end of the run so it ships with the rest of the audit trail. Tests stay green (132/132); ruff clean. Two small helpers added module-side: _preview (single-line preview with ellipsis) and _format_elapsed (M:SS or H:MM:SS). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two failure modes the user just hit motivated this: (a) the CLI binary not on PATH, (b) the CLI not yet authenticated. Both used to surface as a 15-minute async-stream hang followed by an opaque traceback. Added: - A _preflight_check_claude_cli step in AgenticRun._validate that uses shutil.which("claude") to fail fast with a clear remediation message. Skipped when tests inject non-default factories. - A module-level _classify_sdk_error helper that translates claude_agent_sdk._errors exceptions into AgenticRunError with hints matching the most common operator failure modes: * CLINotFoundError → "install Claude Code; put it on PATH" * ProcessError + 401 → "run `claude` once interactively" * ProcessError + key → "set ANTHROPIC_API_KEY (post-Jun-15)" * ProcessError + 429 → "rate / credit limit hit" * CLIConnectionError → "could not connect to Claude CLI" * other ClaudeSDKError → generic fallback - Try/except wrappers around _run_async_safe in both _ClaudeStrategizer.send and _ClaudeImplementer.send that route through _classify_sdk_error. Tests: 6 new unit tests cover the preflight and each classification branch (138/138 total). Ruff clean. Known limitation: code still lives under _src/optimization/. A follow-up commit moves it to _src/agentic/ where it belongs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The orchestrator, prompts, and Lookup utility used to live under _src/optimization/ and _src/datageneration/ — residue from v1 when AgentOptimizer subclassed Optimizer. In v2 the orchestrator is AgenticRun (no Optimizer relation), so the placement was just co-locating agentic code with unrelated native f3dasm modules. This commit moves: _src/optimization/agent_runtime.py → _src/agentic/agent_runtime.py _src/optimization/agent_prompts.py → _src/agentic/agent_prompts.py _src/datageneration/lookup.py → _src/agentic/lookup.py tests/optimization/test_agent_*.py → tests/agentic/test_agent_*.py tests/datageneration/test_lookup.py → tests/agentic/test_lookup.py All renames via git mv (history preserved). Public import paths (`from f3dasm.agentic import ...`) unchanged — only the implementation imports in src/f3dasm/agentic/__init__.py, src/f3dasm/agentic/__main__.py, and the test files were rewritten. The Implementer's f3dasm_primer was updated to recommend the public `from f3dasm.agentic import LookupDataGenerator` path. Native f3dasm directories (_src/optimization/ and _src/datageneration/) now contain zero agentic code. 138/138 tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The Implementer's workspace is persistent across runs by design, but it is per-user output — one collaborator's workspace contents should not be committed to the fork. Add the same gitignore shape used for runs/ so workspaces stay local. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The canonical file name is now PROBLEM_STATEMENT.md, matching how a researcher actually thinks about the artefact (it is the problem statement, not a briefing in the journalistic sense). The conceptual term "briefing-clarification ritual" is preserved in prompts and docs — only the filename changes. Touched 8 files: - src/f3dasm/_src/agentic/agent_runtime.py (6 occurrences) - src/f3dasm/_src/agentic/agent_prompts.py (4 occurrences) - src/f3dasm/agentic/__main__.py (2 occurrences) - src/f3dasm/agentic/__init__.py (1 occurrence) - tests/agentic/test_agent_runtime.py (15 occurrences) - README-agentic.md (6 occurrences) - studies/modular_resonance/PROBLEM_STATEMENT.md (git mv) - studies/project_euler_078/PROBLEM_STATEMENT.md (git mv) 138/138 tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Updates the Repository Layout section to match the actual code layout after the move out of _src/optimization/ and _src/datageneration/: _src/agentic/agent_runtime.py _src/agentic/agent_prompts.py _src/agentic/lookup.py tests/agentic/... Adds two paths the layout previously omitted: runs/<ts>/run.log (progress log mirror) and docs/agentic/ (presentation artefacts). Notes that workspace/ and runs/<ts>/ are gitignored. Drops the "developed in collaboration with Claude" attribution line from the Authorship section. Tightens one accuracy nit: the Implementer's workspace is prompt-enforced today, not OS-sandboxed; the proper sandbox is opt-in (sandbox-exec / Docker) and documented as such.

Three small refactors that turn the implicit-Claude-default-but- injectable shape into a registered-backends shape. No behaviour change for existing callers. New: src/f3dasm/_src/agentic/backends/ __init__.py — empty subpackage marker base.py — Backend frozen dataclass + StrategizerSession, ImplementerSession Protocols (moved from agent_runtime.py so backends can implement against them without importing the runtime) claude.py — _ClaudeStrategizer, _ClaudeImplementer, _classify_sdk_error, _preflight_claude_cli, _strategizer_factory, _implementer_factory, plus CLAUDE_BACKEND = Backend(name="claude", default_model="claude-haiku-4-5-20251001", …) agent_runtime.py: - Imports StrategizerSession/ImplementerSession from backends.base - Imports CLAUDE_BACKEND from backends.claude - AgenticRun.__init__ gains a `backend: Backend | None = None` kwarg, defaulting to CLAUDE_BACKEND. The existing strategizer_factory / implementer_factory kwargs remain as test-injection overrides; when not supplied they fall through to backend.strategizer_factory / .implementer_factory. - `model` kwarg default changes from MVP_DEFAULT_MODEL to None; resolves to backend.default_model when not given. - _preflight_check_claude_cli replaced by a one-liner calling self._backend.preflight() (skipped when stub factories are injected; existing skip-on-test behaviour preserved). - Removed: the _ClaudeStrategizer/Implementer classes, the _classify_sdk_error helper, the old _default_*_factory pair, and an unused `import asyncio`. - MVP_DEFAULT_MODEL still importable from agent_runtime for backward-compat (= CLAUDE_BACKEND.default_model). Public surface: `from f3dasm.agentic import Backend, CLAUDE_BACKEND` now works alongside the existing exports. Tests: 138 prior pass; one new (test_custom_backend_drops_in_via_kwarg) constructs a stub Backend and verifies the kwarg path threads through correctly. 139 total. Ruff clean. Test-import path for _classify_sdk_error updated to its new home. Sized for swap-ability: adding an OpenAI or Gemini backend now requires only a new file under backends/, providing matching strategizer_factory / implementer_factory / preflight / error classification, plus a registered Backend instance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds StudyConfig, _parse_budget, _load_study_config, and _KNOWN_CONFIG_KEYS to agent_runtime.py, plus 5 TDD tests covering missing file, full config, partial config, unknown keys, and bad budget.

Add _remaining() to AgenticRun, set _start_time at execute() start, and guard _tool_delegate with an early-exit when wall-clock budget is exhausted. Covers three new TDD tests.

…a in solution.md

…studies

…s, searchable orchestrator - Generalize vision from 2-agent to N-agent peer topology - Replace 'orchestrator is plumbing' with 'orchestrator is the system's scientific policy and is itself a subject of optimization' — ADAS-searchability as non-negotiable design goal, self-hosting as terminal target - Rewrite Typed Contracts: one symmetric class per channel (Delegation); both parties enrich the same object; metadata dict for extensions; Task/Report retained as backward-compat aliases - Add Typed Stores section: ExperimentData (epistemic center), AnalysisBase (parent+siblings+uncles), TaskRegistry (success_rate tracking) - Add Firing Primitives section: parallel, retry, rounds - Add Reference Topologies: PABLO-style and AgenticSciML-style pseudocode using f3dasm's typed contracts - Update Folder Layout: add config.yaml, transcripts/, run.log (were missing) - Update Public API: add StudyConfig, OLLAMA_BACKEND, Level 2 stubs

…stry, firing primitives Delegation (agent_runtime.py): - Single symmetric exchange class replacing the Task+Report duality - Initiating agent fills request fields (intent, expected_report, remaining_time, budget); responding agent fills response fields (actions_taken, files_touched, conclusions, numbers, raw) - metadata dict for channel-specific extensions without schema changes - _format_delegation / _parse_delegation match Task/Report wire format - Task and Report retained as backward-compat public symbols stores.py (new): - AnalysisBase: hierarchical solution-tree store; get(id) returns parent + siblings + uncles for bounded comparative context - AnalysisNode, AnalysisSlice: typed node and query-result dataclasses - TaskRegistry: capped operator registry (MAX_TASKS=20); tracks attempts, successes, success_rate; evicts lowest-rate non-default entry on overflow; defaults never pruned primitives.py (new): - parallel(agents, task_fn): ThreadPoolExecutor fan-out; failures captured as error Delegations, never silently dropped - retry(agent, task_msg, is_success, max_fails): persistence loop; raises AgenticRunError after max_fails consecutive failures - rounds(agent_a, agent_b, n, initial): fixed-N debate; returns agent_b's response after n complete A→B exchange rounds Tests: 70 new tests across test_delegation.py, test_stores.py, test_primitives.py — all passing (827 total)

…lice, TaskStats, debate - Delegation becomes an envelope: task: Task, report: Report | None, is_complete - RunContext protocol + topology= hook on AgenticRun for custom topologies - StudyConfig.backend now resolved via _BACKEND_REGISTRY (was silently ignored) - register_backend() public function for custom backend registration - rounds() → debate(), returns list[str] transcript [a1, b1, ...] of length 2*n - parallel/retry now accept Task objects (not pre-formatted strings) - AnalysisSlice → ContextSlice (frozen); AnalysisBase.get() → context(include=...) - RegistryEntry → TaskStats (frozen, no is_default); internal _RegistryEntry retains it - AgenticOptimizer added (forward() stub pinning the f3dasm Optimizer interface) - Removed from public API: MAX_TASKS, CHECKPOINT_EVERY, OLLAMA_BACKEND, rounds, AnalysisSlice, RegistryEntry - Tests updated for all API changes (170 passing)

… parallel/retry

…l run pipeline

…ccumulation test

…t > 0; fix debate docstring - Promote _DEFAULT_RETRY_CORRECTIVE to module-level in agent_runtime; primitives imports it to avoid duplicate definition (M1) - Add _parse_eval_budget helper that rejects eval_budget <= 0 with a clear AgenticRunError (M3); two new tests cover 0 and -1 - Fix primitives.debate docstring: report.conclusions is the canonical accessor, not report.raw (both are set, but conclusions is preferred) (M2)

- Add Agent base class with system_prompt, tools, reset_on_checkpoint, description class attributes and model= constructor arg - Replace frozen Graph(roles=) with mutable Graph(nodes=dict[str,Agent]) plus outgoing() helper; legacy roles= kwarg preserved for compat - Add TestAgentClassAPI with 5 TDD tests covering defaults, subclassing, nodes dict routing, unknown edge target, and unknown entry validation

…cutor from edges

…from public API

…Graph.incoming

… + _IMPLEMENTER_ALLOWED_TOOLS Replace _ClaudeStrategizer, _ClaudeImplementer, _strategizer_factory, _implementer_factory, and _IMPLEMENTER_ALLOWED_TOOLS with a single _ClaudeAgentSession class and _session_factory matching the new Backend.session_factory signature (native_tools, closure_tools, study_dir). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… + document mapping Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… FollowUp detection - Add NATIVE_TOOL_NAMES, PROTOCOL_CLOSURE_NAMES imports to agent_runtime - Replace strategizer_factory/implementer_factory with session_factory in AgenticRun.__init__; keep deprecated kwargs as compat wrapper (discriminant: "Delegate" in closure_tools) - Remove _strategizer property (keep _implementer for test compat) - Rewrite _preflight to skip when session_factory is not the backend's - Rewrite _instantiate_node with three-category tool system: native, protocol-closure, topology-injected - Add _make_followup_tool and _build_topology_closures methods - Add FollowUp detection in _tool_delegate (## FollowUp block returns FOLLOW_UP message) - Default graph now declares tools on default agents - Empty tools frozenset treated as unrestricted (backward compat) - Add "Read" alias for "ReadNote" in protocol closures for backward compat - Backend dataclass unfrozen; add compat __post_init__ for legacy strategizer_factory/implementer_factory kwargs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…; add FollowUp/Ask tests - Backend is now frozen=True with required session_factory + preflight fields (no compat kwargs) - AgenticRun.__init__ drops strategizer_factory/implementer_factory kwargs and compat wrapper - Removes _implementer property getter/setter (use run._agents["implementer"] directly) - Migrates all test call-sites to session_factory= with Delegate-discriminant unified factory - Adds _make_session_factory helper; _make_factories kept as thin backward-compat wrapper - Adds test_ask_injected_only_for_entry_node, test_followup_detection_in_tool_delegate, test_conservative_toolset_empty_frozenset (194 tests total, all pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add encoding="utf-8" to all write_text calls (fixes Windows UnicodeEncodeError) - Fix all ruff violations: remove unused imports, sort imports, remove quoted type annotations, wrap lines to 79 chars, convert inline comments to block comments - Add specs/ pages to mkdocs.yml nav (fixes missing-from-nav warning) - Add MANIFEST.in

elvis-aguero and others added 30 commits May 17, 2026 18:32

Add design spec: study config.yaml, time budget, and Ollama backend

d085118

Add implementation plan: config.yaml, budget enforcement, Ollama backend

2c38f33

feat(agentic): add StudyConfig dataclass and _load_study_config

9cc58aa

Adds StudyConfig, _parse_budget, _load_study_config, and _KNOWN_CONFIG_KEYS to agent_runtime.py, plus 5 TDD tests covering missing file, full config, partial config, unknown keys, and bad budget.

feat(agentic): wire StudyConfig into AgenticRun and CLI

135ddee

feat(agentic): implement _remaining() and pre-delegation budget check

1ea8a69

Add _remaining() to AgenticRun, set _start_time at execute() start, and guard _tool_delegate with an early-exit when wall-clock budget is exhausted. Covers three new TDD tests.

feat(agentic): inject remaining_time into Task/Report; budget metadat…

bc9cdad

…a in solution.md

feat(agentic): add IMPLEMENTER_SYSTEM_PROMPT_OLLAMA

5e9c4aa

feat(agentic): add Ollama backend with bash-tool Implementer

2d90872

feat(agentic): export OLLAMA_BACKEND; add config.yaml to all agentic …

1a99fa6

…studies

feat(agentic): add on_failure corrective to primitives.retry

a869da3

feat(agentic): debate returns list[Delegation]; uniform contract with…

a6295d3

… parallel/retry

elvis-aguero and others added 25 commits May 21, 2026 15:42

feat(agentic): add ctx.parallel to RunContext — delegates through ful…

1c19e3e

…l run pipeline

feat(agentic): add ctx.retry with on_failure to RunContext

3325d7f

feat(agentic): add ctx.debate returning list[Delegation] to RunContext

54af9bc

feat(agentic): add eval_budget to StudyConfig + config.yaml

18d71cf

feat(agentic): eval_count tracking + soft warning injection into Task

b0cbf85

fix(agentic): clamp eval_count_remaining >= 0; add multi-delegation a…

9934033

…ccumulation test

fix(agentic): frozenset tools; _resolve_roles guard; Graph validation

180caa0

refactor(agentic): AgenticRun wired to Graph.nodes; infer planner/exe…

c4cc8ab

…cutor from edges

feat(agentic): Parallel, Debate, Retry as injected LLM tool closures

2fe358e

chore(agentic): export Agent; remove AgentRole, RunContext, topology …

7bf9bae

…from public API

feat(agentic): tool vocabulary constants + Backend.session_factory + …

1745667

…Graph.incoming

refactor(agentic): unified _OllamaAgentSession — delete split classes…

c61fc6b

… + document mapping Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(agentic): update ollama tests to use _OllamaAgentSession

960145d

updated agentic readme with current API

1140ad5

docs(agentic): add flow diagram to README-agentic

5c1fc35

docs(agentic): remove flow diagram

8592f2f

docs(agentic): clarify f3dasm integration boundary in README

215a7c5

docs(agentic): revert diagram, keep f3dasm integration paragraph

ce60fae

docs(agentic): restore f3dasm-framed diagram in method overview

64437d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic f3dasm CRW June 2026 prototype (NOT TO BE MERGED WITH MAIN)#347

Agentic f3dasm CRW June 2026 prototype (NOT TO BE MERGED WITH MAIN)#347
elvis-aguero wants to merge 55 commits into
bessagroup:mainfrom
elvis-aguero:dev/agentic-mvp-v2

elvis-aguero commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elvis-aguero commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Release Week Checklist

Quality of GitHub repository README.md file

Project Structure and Files

Code Quality

Documentation Quality

Benchmarks, Testing, and Distribution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elvis-aguero commented May 22, 2026 •

edited

Loading

Quality of GitHub repository `README.md` file