Agentic f3dasm CRW June 2026 prototype (NOT TO BE MERGED WITH MAIN)#347
Open
elvis-aguero wants to merge 55 commits into
Open
Agentic f3dasm CRW June 2026 prototype (NOT TO BE MERGED WITH MAIN)#347elvis-aguero wants to merge 55 commits into
elvis-aguero wants to merge 55 commits into
Conversation
Strictly additive layer on top of f3dasm. No native modules edited. Public API: `from f3dasm.agentic import …`. Architecture: two persistent peer Claude Agent SDK sessions (Strategizer, Implementer) routed by a Python orchestrator. Strategizer emits Strategy payloads; Implementer dispatches them via `run_strategy` and may follow up with `ask_strategizer` (bounded). Provenance via `__turn` output column + TurnRecord log; runs end with a `write_deliverable()` folder (solution.md + replicate.py + dataset snapshot + turn_log.jsonl). MVP strategy registry: latin, sobol, random_uniform, grid, local_random. Lookup DataGenerator for pool-backed evaluation. Supercompressible study wired as first test of the generic MVP. 173 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v1 MVP (registry + AnalysisBase + fixed-iteration loop) was over-engineered and supercompressible-leaky. v2 will be a thin runtime over two persistent Claude SDK sessions with carefully engineered prompts and git-based provenance. Keep LookupDataGenerator as a utility the Implementer may import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two persistent Claude Agent SDK sessions (Strategizer + Implementer) routed by a thin Python runtime. Strategizer reads any file, writes .md notes only, drives the run via Delegate/Ask/Done tools. Implementer receives Task payloads, executes freely (Read/Write/Edit /Bash/Glob/Grep restricted to the study dir), returns structured Report blocks. Provenance via per-delegation git commits to an isolated run repo at runs/<timestamp>/.git/ (study-tree work-tree, no touching of any enclosing user repo). Checkpoint every 30 delegations: Strategizer summarises, user steers, Implementer session is reset. Prompts engineered with SOTA techniques: XML-tagged sections, named failure-mode mitigations (anchoring, confirmation, availability, role drift, sycophancy, premature convergence), non-negotiable briefing- clarification ritual, refusal of hypothesis-verification by the Implementer. User surface: studies/<study>/briefing.md is the only required file. CLI: python -m f3dasm.agentic <study-dir>. 32 tests pass, ruff clean. No native f3dasm edits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… transcripts Four bundled additions to the v2 runtime: 1. BORA-style hypothesis log (Cissé et al. 2025) — Strategizer maintains strategizer_notes/hypotheses.md as a sequence of named ## Comment blocks (statement / confidence / evidence / status / last_updated_delegation). Done() is gated on at least one supported AND one refuted comment — the falsification requirement expressed as a log invariant. 2. Reflexion-style error reflection (Shinn et al. 2023) — failed delegations now return REFLECT: <diagnosis> instead of a static ERROR string. _classify_failed_implementer_response categorises the failure (short, capability-limit, missing-subsections, no-Report, default) and the Strategizer prompt forbids identical-intent re-delegation after REFLECT. 3. SelfAI 3-stage reasoning (Wu et al. 2025) — Implementer prompt now requires three labelled sections before ## Report: Task restatement, Workspace inventory, Execution plan. _parse_report still ignores pre-Report text, no parsing change needed. 4. Per-turn transcript capture for debuggability — JSONL events (user_message / tool_call / tool_result / assistant_text) written under runs/<ts>/transcripts/ and copied to the deliverable folder. read_transcript() helper for post-run inspection. Caveat: the SDK's query() stream does not expose Implementer-internal tool calls through the current code path; we capture net inputs and outputs per delegation, not the full chain. 46 tests pass (32 prior + 14 new), ruff clean. No native f3dasm edits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1. WriteMarkdown path resolution is now lenient: bare filenames and relative paths under strategizer_notes/ are anchored to the canonical run directory; only path-escape and non-.md extensions are rejected. Strategizer can write hypotheses.md directly without needing to construct the run timestamp. 2. Implementer system prompt now gets a <workspace> preamble at session creation, injecting the absolute workspace path and forbidding /tmp explicitly. Replaces the previous habit of the Implementer scratching in /tmp (which left deliverables empty). 3. One-shot corrective retry on missing ## Report block. When the Implementer's first reply doesn't parse, the runtime sends a focused corrective message restating the required structure; if the retry yields a valid Report the delegation is counted. Two bad replies in a row still fall through to REFLECT and count as a delegation failure. End-to-end verification: new study studies/modular_resonance/ with a PE 952-inspired two-parameter resonance optimization. Forces f3dasm usage explicitly in the briefing. Haiku ran 5 delegations across 4 search phases (2884 evaluations), populated the BORA- style hypothesis log with status transitions including a real falsification, wrote 27 workspace artefacts and a runnable replicate.py. Independent brute-force confirms the agent's answer (k=6, m=99991, resonance=8685.089) is the true global optimum. 47/47 tests pass (added test_one_shot_retry_recovers_invalid_report and test_two_invalid_replies_fall_through_to_reflect, updated test_write_markdown_bad_paths for lenient path semantics). Ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
README-agentic.md sits at the repo root alongside the upstream README.md without disturbing it. It is the standalone landing document for the agentic-f3dasm layer and covers the six sections required by the BRG checklist (Summary, Statement of need, Authorship, Getting started, Community support, License) plus the four Part-1 presentation sections (Context, Statement of Need, Method overview, Code purpose/audience/goals). docs/agentic/class_diagram.dot + .svg is the class diagram required by Part 2 of the presentation rubric. It shows AgenticRun, the Strategizer/Implementer Protocol pair and their _Claude* implementations, the Task/Report payloads, the tool-closure surface, the LookupDataGenerator utility, and the native f3dasm core (drawn dashed to make explicit that the agentic layer does not edit it). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Neither package is imported anywhere under src/, tests/, studies/, or docs/ — verified by grep for `import langchain*` / `from langchain*` / `import langgraph` / `from langgraph` (zero hits). The only matches were in `src/f3dasm.egg-info/*` (autogenerated from pyproject.toml itself) and in `docs/specs/literature-map.md` (prose discussion of a paper that uses LangGraph). Removing them shrinks the base install for every f3dasm user without affecting the agentic layer or any other behaviour. 120/120 agentic tests still pass; `from f3dasm.agentic import ...` still works. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Honours the original "all prompts as module constants" agreement. Seven prompt strings that had been built inline in agent_runtime.py become named, docstring-attached constants in agent_prompts.py: RUN_PATHS_PREAMBLE_TEMPLATE (study_dir, notes_dir) WORKSPACE_PREAMBLE_TEMPLATE (workspace_dir) IMPLEMENTER_REPORT_RETRY_PROMPT (static) REFLECT_DIAGNOSIS_SHORT (static) REFLECT_DIAGNOSIS_CAPABILITY_LIMIT (static) REFLECT_DIAGNOSIS_MISSING_SUBSECTIONS_TEMPLATE (missing_subsections) REFLECT_DIAGNOSIS_NO_REPORT_HEADING (static) REFLECT_DIAGNOSIS_DEFAULT (static) Runtime call sites updated: _compose_strategizer_prompt → RUN_PATHS_PREAMBLE_TEMPLATE.format(...) _compose_implementer_prompt → WORKSPACE_PREAMBLE_TEMPLATE.format(...) _tool_delegate → IMPLEMENTER_REPORT_RETRY_PROMPT _classify_failed_implementer_response → the five REFLECT constants No behaviour change. 12 new prompt-sanity tests added (total 59 in the prompt+runtime suites; 132 across the agentic surface). Ruff clean. Prompt iteration is now a single-file concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plain pip install silently falls back to the global Python when no venv is active, which is the wrong default for a project that pins claude-agent-sdk. Switch the docs to uv: uv venv uv pip install -e ".[agentic]" uv run python -m f3dasm.agentic <study-dir> Document the python -m venv + pip fallback for users without uv, but explicitly warn against unactivated pip install. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CLI used to be silent while the SDK churned for minutes between delegations. Now AgenticRun configures a `f3dasm.agentic` logger with two handlers — a stderr StreamHandler and a FileHandler pointed at `<run_dir>/run.log` — both formatted as [HH:MM:SS] LEVEL message INFO-level events at the milestones the user actually cares about: - run start (with run_dir, model, checkpoint cadence) - briefing read (char count) - sessions started - each delegation start (with first 80 chars of intent) - each delegation end (elapsed M:SS + first 100 chars of conclusions) - Implementer session reset - checkpoint firing (delegation count) - Done received (preview of Strategizer summary) - deliverable assembled (path) WARNING when a delegation falls through to REFLECT after the corrective retry. run.log is copied into deliverable/run.log at the end of the run so it ships with the rest of the audit trail. Tests stay green (132/132); ruff clean. Two small helpers added module-side: _preview (single-line preview with ellipsis) and _format_elapsed (M:SS or H:MM:SS). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two failure modes the user just hit motivated this: (a) the CLI
binary not on PATH, (b) the CLI not yet authenticated. Both used to
surface as a 15-minute async-stream hang followed by an opaque
traceback.
Added:
- A _preflight_check_claude_cli step in AgenticRun._validate that
uses shutil.which("claude") to fail fast with a clear remediation
message. Skipped when tests inject non-default factories.
- A module-level _classify_sdk_error helper that translates
claude_agent_sdk._errors exceptions into AgenticRunError with
hints matching the most common operator failure modes:
* CLINotFoundError → "install Claude Code; put it on PATH"
* ProcessError + 401 → "run `claude` once interactively"
* ProcessError + key → "set ANTHROPIC_API_KEY (post-Jun-15)"
* ProcessError + 429 → "rate / credit limit hit"
* CLIConnectionError → "could not connect to Claude CLI"
* other ClaudeSDKError → generic fallback
- Try/except wrappers around _run_async_safe in both
_ClaudeStrategizer.send and _ClaudeImplementer.send that route
through _classify_sdk_error.
Tests: 6 new unit tests cover the preflight and each
classification branch (138/138 total). Ruff clean.
Known limitation: code still lives under _src/optimization/.
A follow-up commit moves it to _src/agentic/ where it belongs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The orchestrator, prompts, and Lookup utility used to live under _src/optimization/ and _src/datageneration/ — residue from v1 when AgentOptimizer subclassed Optimizer. In v2 the orchestrator is AgenticRun (no Optimizer relation), so the placement was just co-locating agentic code with unrelated native f3dasm modules. This commit moves: _src/optimization/agent_runtime.py → _src/agentic/agent_runtime.py _src/optimization/agent_prompts.py → _src/agentic/agent_prompts.py _src/datageneration/lookup.py → _src/agentic/lookup.py tests/optimization/test_agent_*.py → tests/agentic/test_agent_*.py tests/datageneration/test_lookup.py → tests/agentic/test_lookup.py All renames via git mv (history preserved). Public import paths (`from f3dasm.agentic import ...`) unchanged — only the implementation imports in src/f3dasm/agentic/__init__.py, src/f3dasm/agentic/__main__.py, and the test files were rewritten. The Implementer's f3dasm_primer was updated to recommend the public `from f3dasm.agentic import LookupDataGenerator` path. Native f3dasm directories (_src/optimization/ and _src/datageneration/) now contain zero agentic code. 138/138 tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Implementer's workspace is persistent across runs by design, but it is per-user output — one collaborator's workspace contents should not be committed to the fork. Add the same gitignore shape used for runs/ so workspaces stay local. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The canonical file name is now PROBLEM_STATEMENT.md, matching how a researcher actually thinks about the artefact (it is the problem statement, not a briefing in the journalistic sense). The conceptual term "briefing-clarification ritual" is preserved in prompts and docs — only the filename changes. Touched 8 files: - src/f3dasm/_src/agentic/agent_runtime.py (6 occurrences) - src/f3dasm/_src/agentic/agent_prompts.py (4 occurrences) - src/f3dasm/agentic/__main__.py (2 occurrences) - src/f3dasm/agentic/__init__.py (1 occurrence) - tests/agentic/test_agent_runtime.py (15 occurrences) - README-agentic.md (6 occurrences) - studies/modular_resonance/PROBLEM_STATEMENT.md (git mv) - studies/project_euler_078/PROBLEM_STATEMENT.md (git mv) 138/138 tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates the Repository Layout section to match the actual code layout after the move out of _src/optimization/ and _src/datageneration/: _src/agentic/agent_runtime.py _src/agentic/agent_prompts.py _src/agentic/lookup.py tests/agentic/... Adds two paths the layout previously omitted: runs/<ts>/run.log (progress log mirror) and docs/agentic/ (presentation artefacts). Notes that workspace/ and runs/<ts>/ are gitignored. Drops the "developed in collaboration with Claude" attribution line from the Authorship section. Tightens one accuracy nit: the Implementer's workspace is prompt-enforced today, not OS-sandboxed; the proper sandbox is opt-in (sandbox-exec / Docker) and documented as such.
Three small refactors that turn the implicit-Claude-default-but-
injectable shape into a registered-backends shape. No behaviour
change for existing callers.
New: src/f3dasm/_src/agentic/backends/
__init__.py — empty subpackage marker
base.py — Backend frozen dataclass + StrategizerSession,
ImplementerSession Protocols (moved from
agent_runtime.py so backends can implement
against them without importing the runtime)
claude.py — _ClaudeStrategizer, _ClaudeImplementer,
_classify_sdk_error, _preflight_claude_cli,
_strategizer_factory, _implementer_factory,
plus CLAUDE_BACKEND = Backend(name="claude",
default_model="claude-haiku-4-5-20251001", …)
agent_runtime.py:
- Imports StrategizerSession/ImplementerSession from backends.base
- Imports CLAUDE_BACKEND from backends.claude
- AgenticRun.__init__ gains a `backend: Backend | None = None`
kwarg, defaulting to CLAUDE_BACKEND. The existing
strategizer_factory / implementer_factory kwargs remain as
test-injection overrides; when not supplied they fall through
to backend.strategizer_factory / .implementer_factory.
- `model` kwarg default changes from MVP_DEFAULT_MODEL to
None; resolves to backend.default_model when not given.
- _preflight_check_claude_cli replaced by a one-liner calling
self._backend.preflight() (skipped when stub factories are
injected; existing skip-on-test behaviour preserved).
- Removed: the _ClaudeStrategizer/Implementer classes, the
_classify_sdk_error helper, the old _default_*_factory pair,
and an unused `import asyncio`.
- MVP_DEFAULT_MODEL still importable from agent_runtime for
backward-compat (= CLAUDE_BACKEND.default_model).
Public surface: `from f3dasm.agentic import Backend, CLAUDE_BACKEND`
now works alongside the existing exports.
Tests: 138 prior pass; one new (test_custom_backend_drops_in_via_kwarg)
constructs a stub Backend and verifies the kwarg path threads through
correctly. 139 total. Ruff clean. Test-import path for
_classify_sdk_error updated to its new home.
Sized for swap-ability: adding an OpenAI or Gemini backend now
requires only a new file under backends/, providing matching
strategizer_factory / implementer_factory / preflight / error
classification, plus a registered Backend instance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds StudyConfig, _parse_budget, _load_study_config, and _KNOWN_CONFIG_KEYS to agent_runtime.py, plus 5 TDD tests covering missing file, full config, partial config, unknown keys, and bad budget.
Add _remaining() to AgenticRun, set _start_time at execute() start, and guard _tool_delegate with an early-exit when wall-clock budget is exhausted. Covers three new TDD tests.
…s, searchable orchestrator - Generalize vision from 2-agent to N-agent peer topology - Replace 'orchestrator is plumbing' with 'orchestrator is the system's scientific policy and is itself a subject of optimization' — ADAS-searchability as non-negotiable design goal, self-hosting as terminal target - Rewrite Typed Contracts: one symmetric class per channel (Delegation); both parties enrich the same object; metadata dict for extensions; Task/Report retained as backward-compat aliases - Add Typed Stores section: ExperimentData (epistemic center), AnalysisBase (parent+siblings+uncles), TaskRegistry (success_rate tracking) - Add Firing Primitives section: parallel, retry, rounds - Add Reference Topologies: PABLO-style and AgenticSciML-style pseudocode using f3dasm's typed contracts - Update Folder Layout: add config.yaml, transcripts/, run.log (were missing) - Update Public API: add StudyConfig, OLLAMA_BACKEND, Level 2 stubs
…stry, firing primitives Delegation (agent_runtime.py): - Single symmetric exchange class replacing the Task+Report duality - Initiating agent fills request fields (intent, expected_report, remaining_time, budget); responding agent fills response fields (actions_taken, files_touched, conclusions, numbers, raw) - metadata dict for channel-specific extensions without schema changes - _format_delegation / _parse_delegation match Task/Report wire format - Task and Report retained as backward-compat public symbols stores.py (new): - AnalysisBase: hierarchical solution-tree store; get(id) returns parent + siblings + uncles for bounded comparative context - AnalysisNode, AnalysisSlice: typed node and query-result dataclasses - TaskRegistry: capped operator registry (MAX_TASKS=20); tracks attempts, successes, success_rate; evicts lowest-rate non-default entry on overflow; defaults never pruned primitives.py (new): - parallel(agents, task_fn): ThreadPoolExecutor fan-out; failures captured as error Delegations, never silently dropped - retry(agent, task_msg, is_success, max_fails): persistence loop; raises AgenticRunError after max_fails consecutive failures - rounds(agent_a, agent_b, n, initial): fixed-N debate; returns agent_b's response after n complete A→B exchange rounds Tests: 70 new tests across test_delegation.py, test_stores.py, test_primitives.py — all passing (827 total)
…lice, TaskStats, debate - Delegation becomes an envelope: task: Task, report: Report | None, is_complete - RunContext protocol + topology= hook on AgenticRun for custom topologies - StudyConfig.backend now resolved via _BACKEND_REGISTRY (was silently ignored) - register_backend() public function for custom backend registration - rounds() → debate(), returns list[str] transcript [a1, b1, ...] of length 2*n - parallel/retry now accept Task objects (not pre-formatted strings) - AnalysisSlice → ContextSlice (frozen); AnalysisBase.get() → context(include=...) - RegistryEntry → TaskStats (frozen, no is_default); internal _RegistryEntry retains it - AgenticOptimizer added (forward() stub pinning the f3dasm Optimizer interface) - Removed from public API: MAX_TASKS, CHECKPOINT_EVERY, OLLAMA_BACKEND, rounds, AnalysisSlice, RegistryEntry - Tests updated for all API changes (170 passing)
…t > 0; fix debate docstring - Promote _DEFAULT_RETRY_CORRECTIVE to module-level in agent_runtime; primitives imports it to avoid duplicate definition (M1) - Add _parse_eval_budget helper that rejects eval_budget <= 0 with a clear AgenticRunError (M3); two new tests cover 0 and -1 - Fix primitives.debate docstring: report.conclusions is the canonical accessor, not report.raw (both are set, but conclusions is preferred) (M2)
- Add Agent base class with system_prompt, tools, reset_on_checkpoint, description class attributes and model= constructor arg - Replace frozen Graph(roles=) with mutable Graph(nodes=dict[str,Agent]) plus outgoing() helper; legacy roles= kwarg preserved for compat - Add TestAgentClassAPI with 5 TDD tests covering defaults, subclassing, nodes dict routing, unknown edge target, and unknown entry validation
… + _IMPLEMENTER_ALLOWED_TOOLS Replace _ClaudeStrategizer, _ClaudeImplementer, _strategizer_factory, _implementer_factory, and _IMPLEMENTER_ALLOWED_TOOLS with a single _ClaudeAgentSession class and _session_factory matching the new Backend.session_factory signature (native_tools, closure_tools, study_dir). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… + document mapping Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… FollowUp detection - Add NATIVE_TOOL_NAMES, PROTOCOL_CLOSURE_NAMES imports to agent_runtime - Replace strategizer_factory/implementer_factory with session_factory in AgenticRun.__init__; keep deprecated kwargs as compat wrapper (discriminant: "Delegate" in closure_tools) - Remove _strategizer property (keep _implementer for test compat) - Rewrite _preflight to skip when session_factory is not the backend's - Rewrite _instantiate_node with three-category tool system: native, protocol-closure, topology-injected - Add _make_followup_tool and _build_topology_closures methods - Add FollowUp detection in _tool_delegate (## FollowUp block returns FOLLOW_UP message) - Default graph now declares tools on default agents - Empty tools frozenset treated as unrestricted (backward compat) - Add "Read" alias for "ReadNote" in protocol closures for backward compat - Backend dataclass unfrozen; add compat __post_init__ for legacy strategizer_factory/implementer_factory kwargs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…; add FollowUp/Ask tests - Backend is now frozen=True with required session_factory + preflight fields (no compat kwargs) - AgenticRun.__init__ drops strategizer_factory/implementer_factory kwargs and compat wrapper - Removes _implementer property getter/setter (use run._agents["implementer"] directly) - Migrates all test call-sites to session_factory= with Delegate-discriminant unified factory - Adds _make_session_factory helper; _make_factories kept as thin backward-compat wrapper - Adds test_ask_injected_only_for_entry_node, test_followup_detection_in_tool_delegate, test_conservative_toolset_empty_frozenset (194 tests total, all pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add encoding="utf-8" to all write_text calls (fixes Windows UnicodeEncodeError) - Fix all ruff violations: remove unused imports, sort imports, remove quoted type annotations, wrap lines to 79 chars, convert inline comments to block comments - Add specs/ pages to mkdocs.yml nav (fixes missing-from-nav warning) - Add MANIFEST.in
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Code Release Week Checklist
This checklist is intended to help reviewers evaluate code repositories during Code Release Week. Reviewers should go through each section and mark the corresponding checkboxes based on their assessment of the repository.
Quality of GitHub repository
README.mdfileIs the content properly structured and does it contain the minimum expected sections: Summary, Statement of need, Authorship, Getting started, Community Support, License?
Is the content grammatically well-written, clear, and easily understandable?
Is the high-level functionality and purpose of the project clear for a diverse, non-specialist audience?
Does it contain a clear statement of need that illustrates the purpose of the project?
Does it contain a set of key references for the user/developer (e.g., paper, documentation, installation, benchmarks, tutorials)?
Project Structure and Files
Does the project include a
src/<project_name>folder where the whole code is contained?Does the project include the
pyproject.tomlfile?Does the project include the
LICENSEfile (with contents of an OSI approved license)?Does the project include the
MANIFEST.infile?Does the project include a
.gitignorefile?Does the project include a
Makefilewith common targets such asbuild,test,lint, anddocs?Does the project include a
docs/documentation folder containing the required files to generate HTML documentation?Code Quality
Is the code compliant with the Coding Fundamentals of the BRG Python Coding Style?
Is the code compliant with the docstring format of the BRG Python Coding Style?
Does the code include docstrings for all modules, functions, and classes?
Does the code include comments explaining non-trivial code blocks?
Does the code pass the Ruff check (compliance with BRG Python Coding Style)?
Documentation Quality
Can the documentation be successfully built?
Is the content grammatically well-written, clear, and easily understandable?
Is the high-level functionality and purpose of the project clear for a diverse, non-specialist audience?
Does it contain a clear statement of need that illustrates the purpose of the project?
Does it contain clear installation instructions that follow standard procedures?
Does it include sufficient documentation to understand the core functionality of the code?
Does it include an automatically generated API that contains the complete documentation of the code (modules, classes, functions)?
Benchmarks, Testing, and Distribution
Can the code be successfully installed by following the provided instructions?
Does it include a simple example usage that illustrates how to use the code?
Does the project include a set of benchmarks that cover the core functionality of the code?
Does the project include a test suite?
Can the Python package distribution be successfully built?
(Optional if code is public) Can the Python package distribution be successfully installed from PyPI?