feat: add case-study seed report by Gradata · Pull Request #273 · Gradata/gradata

Gradata · 2026-06-08T13:59:37Z

Summary

Adds gradata report case-study-seed with Markdown output and --json support.
Generates evidence-first case-study seeds from local system.db corrections, RULE_GRADUATED rules, and injection/application events.
Omits raw prompt/draft/final content by default; emits summaries/counts/caveats for human permission workflow.

Verification

env -u BRAIN_DIR -u GRADATA_BRAIN python3 -m pytest tests/test_case_study_seed_report.py tests/test_status_command.py tests/test_prove_command.py -q → 14 passed
/home/olive/.local/bin/uvx ruff check src/gradata/enhancements/case_study_seed.py tests/test_case_study_seed_report.py src/gradata/cli.py → All checks passed
CLI smoke: env -u BRAIN_DIR -u GRADATA_BRAIN PYTHONPATH=src python3 -m gradata.cli --brain-dir "$tmp" report case-study-seed --json → pattern Invented API fields, raw_prompt_content_included False

Paperclip: GRA-2411

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-06-08T13:59:51Z

📝 Walkthrough

Added new CLI command gradata report case-study-seed to generate evidence-first case-study seeds from local database corrections and RULE_GRADUATED rules
Supports both Markdown (default) and JSON output via --json flag
New public API functions: generate_case_study_seed(db_path: str | Path) -> dict[str, Any] and render_case_study_markdown(seed: dict[str, Any]) -> str in Gradata/src/gradata/enhancements/case_study_seed.py
Privacy-first design: omits raw prompt/draft/final content by default, includes summaries and caveats for human permission workflow
Case-study seed includes correction/rule/injection event counts, top repeated mistake patterns, and redaction metadata
Updated CLI argument parsing to recognize case-study-seed report type and add --json flag support
All existing report types (CSV/metrics/rules/health) continue to work as before
Comprehensive test coverage: 14 tests passed, including markdown rendering validation and CLI JSON output verification
Lint and type checks passed on all relevant files

Walkthrough

This PR introduces a new "case-study-seed" report type to the Gradata CLI. It loads events from a SQLite database, identifies the top repeated correction pattern, collects related rules and evidence, and renders the result as privacy-safe JSON or Markdown. Tests validate seed generation, evidence redaction, markdown output, and CLI integration.

Changes

Case-study seed report generation

Layer / File(s)	Summary
CLI argument parsing and command handler `Gradata/src/gradata/cli.py`	The report subcommand parser now includes `case-study-seed` in type choices and adds a `--json` flag. The `cmd_report` function branches on `case-study-seed` to generate and output a seed without loading the full brain.
Case-study seed data generation from events database `Gradata/src/gradata/enhancements/case_study_seed.py`	Loads SQLite events, parses JSON data, identifies the top repeated correction category/pattern, collects matching RULE events and lesson injections, extracts privacy-safe before/after evidence using summary fields, and assembles a structured seed with counts, metadata, and caveats.
Case-study seed markdown rendering `Gradata/src/gradata/enhancements/case_study_seed.py`	Transforms the case-study seed structure into Markdown sections covering top repeated mistake, associated rules, before/after evidence, counts, privacy notes, and review caveats.
Test helpers and case-study seed validation `Gradata/tests/test_case_study_seed_report.py`	Provides SQLite fixtures to seed event records and validates seed generation (correct top-mistake identification, raw-prompt redaction), markdown output format, and end-to-end CLI JSON output.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

feature

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add case-study seed report' clearly and concisely summarizes the primary change: adding a new case-study seed report feature to the CLI.
Description check	✅ Passed	The description is directly related to the changeset, detailing the new case-study seed feature, its functionality, and verification steps performed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gra-2411-case-study-seed

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.18][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/enhancements/case_study_seed.py`:
- Line 124: The seed currently sets "source_db": str(db) exposing full
filesystem paths; change this to a privacy-safe representation by replacing the
full path with a sanitized value (e.g., Path(db).name or os.path.basename(db) or
replace the user's home dir with "~") before assigning to source_db; update
imports if needed (pathlib.Path or os.path) and keep the variable names
source_db and db so the change is localized and easy to spot in
case_study_seed.py.
- Line 120: matching_injections currently only filters injection_events by
category using _category_pattern(e)[0]; narrow it to the chosen mistake pattern
as well by adding a second predicate that compares the event's pattern id/name
(from _category_pattern(e)[1]) to the selected pattern variable (e.g.,
selected_pattern or selected_mistake_pattern) so the comprehension becomes:
filter injection_events where _category_pattern(e)[0] == category AND
_category_pattern(e)[1] == selected_pattern (use the actual variable name used
in this scope).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bd8fc9f8-3d27-4d9d-9549-e21c8fc8f750

📥 Commits

Reviewing files that changed from the base of the PR and between 2b12800 and b6199ba.

📒 Files selected for processing (3)

Gradata/src/gradata/cli.py
Gradata/src/gradata/enhancements/case_study_seed.py
Gradata/tests/test_case_study_seed_report.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: pytest (py3.12)
GitHub Check: pytest (py3.11)
GitHub Check: pytest macos-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.11
GitHub Check: pytest windows-latest / py3.12
GitHub Check: pytest macos-latest / py3.12
GitHub Check: pytest windows-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.12

🧰 Additional context used

📓 Path-based instructions (2)

Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

Gradata/src/gradata/enhancements/case_study_seed.py
Gradata/src/gradata/cli.py

Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

Gradata/tests/test_case_study_seed_report.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

📚 Learning: 2026-05-01T15:50:32.772Z

Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Add unit tests in `tests/test_*.py` for every CI push without LLM calls (deterministic); mark integration tests with `pytest.mark.integration` and skip them by default (they hit real LLM APIs)

Applied to files:

Gradata/tests/test_case_study_seed_report.py

📚 Learning: 2026-04-17T17:18:07.439Z

Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

Applied to files:

Gradata/tests/test_case_study_seed_report.py

📚 Learning: 2026-05-01T15:50:32.772Z

Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Set `BRAIN_DIR` environment variable via `tmp_path` in conftest.py for test isolation — ensure `_paths.py` module cache refreshes when calling `Brain.init()` directly inside tests

Applied to files:

Gradata/tests/test_case_study_seed_report.py

🔇 Additional comments (2)

Gradata/src/gradata/cli.py (1)

668-681: LGTM!

Also applies to: 2121-2129

Gradata/tests/test_case_study_seed_report.py (1)

15-126: LGTM!

coderabbitai · 2026-06-08T14:04:48Z

+                }
+            )
+
+    matching_injections = [e for e in injection_events if _category_pattern(e)[0] == category]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow injection matching to the selected mistake pattern.

matching_injections currently filters only by category, so unrelated injections in the same category inflate the seed evidence for the top pattern.

Suggested fix

- matching_injections = [e for e in injection_events if _category_pattern(e)[0] == category] + matching_injections = [e for e in injection_events if _matches(e, category, pattern)]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/enhancements/case_study_seed.py` at line 120, matching_injections currently only filters injection_events by category using _category_pattern(e)[0]; narrow it to the chosen mistake pattern as well by adding a second predicate that compares the event's pattern id/name (from _category_pattern(e)[1]) to the selected pattern variable (e.g., selected_pattern or selected_mistake_pattern) so the comprehension becomes: filter injection_events where _category_pattern(e)[0] == category AND _category_pattern(e)[1] == selected_pattern (use the actual variable name used in this scope).

coderabbitai · 2026-06-08T14:04:48Z

+
+    return {
+        "report": "case-study-seed",
+        "source_db": str(db),


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid exposing full local DB paths in privacy-safe output.

Line [124] includes the full filesystem path in source_db, which can leak host/user info when this seed is shared externally.

Suggested fix

- "source_db": str(db), + "source_db": db.name,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/enhancements/case_study_seed.py` at line 124, The seed currently sets "source_db": str(db) exposing full filesystem paths; change this to a privacy-safe representation by replacing the full path with a sanitized value (e.g., Path(db).name or os.path.basename(db) or replace the user's home dir with "~") before assigning to source_db; update imports if needed (pathlib.Path or os.path) and keep the variable names source_db and db so the change is localized and easy to spot in case_study_seed.py.

feat: add case-study seed report

b6199ba

greptile-apps Bot reviewed Jun 8, 2026

View reviewed changes

coderabbitai Bot added the feature label Jun 8, 2026

coderabbitai Bot requested changes Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add case-study seed report#273

feat: add case-study seed report#273
Gradata wants to merge 1 commit into
mainfrom
gra-2411-case-study-seed

Gradata commented Jun 8, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Uh oh!

coderabbitai Bot Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Jun 8, 2026

Summary

Verification

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading