Skip to content

feat: add case-study seed report#273

Open
Gradata wants to merge 1 commit into
mainfrom
gra-2411-case-study-seed
Open

feat: add case-study seed report#273
Gradata wants to merge 1 commit into
mainfrom
gra-2411-case-study-seed

Conversation

@Gradata

@Gradata Gradata commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds gradata report case-study-seed with Markdown output and --json support.
  • Generates evidence-first case-study seeds from local system.db corrections, RULE_GRADUATED rules, and injection/application events.
  • Omits raw prompt/draft/final content by default; emits summaries/counts/caveats for human permission workflow.

Verification

  • env -u BRAIN_DIR -u GRADATA_BRAIN python3 -m pytest tests/test_case_study_seed_report.py tests/test_status_command.py tests/test_prove_command.py -q → 14 passed
  • /home/olive/.local/bin/uvx ruff check src/gradata/enhancements/case_study_seed.py tests/test_case_study_seed_report.py src/gradata/cli.py → All checks passed
  • CLI smoke: env -u BRAIN_DIR -u GRADATA_BRAIN PYTHONPATH=src python3 -m gradata.cli --brain-dir "$tmp" report case-study-seed --json → pattern Invented API fields, raw_prompt_content_included False

Paperclip: GRA-2411

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough
  • Added new CLI command gradata report case-study-seed to generate evidence-first case-study seeds from local database corrections and RULE_GRADUATED rules
  • Supports both Markdown (default) and JSON output via --json flag
  • New public API functions: generate_case_study_seed(db_path: str | Path) -> dict[str, Any] and render_case_study_markdown(seed: dict[str, Any]) -> str in Gradata/src/gradata/enhancements/case_study_seed.py
  • Privacy-first design: omits raw prompt/draft/final content by default, includes summaries and caveats for human permission workflow
  • Case-study seed includes correction/rule/injection event counts, top repeated mistake patterns, and redaction metadata
  • Updated CLI argument parsing to recognize case-study-seed report type and add --json flag support
  • All existing report types (CSV/metrics/rules/health) continue to work as before
  • Comprehensive test coverage: 14 tests passed, including markdown rendering validation and CLI JSON output verification
  • Lint and type checks passed on all relevant files

Walkthrough

This PR introduces a new "case-study-seed" report type to the Gradata CLI. It loads events from a SQLite database, identifies the top repeated correction pattern, collects related rules and evidence, and renders the result as privacy-safe JSON or Markdown. Tests validate seed generation, evidence redaction, markdown output, and CLI integration.

Changes

Case-study seed report generation

Layer / File(s) Summary
CLI argument parsing and command handler
Gradata/src/gradata/cli.py
The report subcommand parser now includes case-study-seed in type choices and adds a --json flag. The cmd_report function branches on case-study-seed to generate and output a seed without loading the full brain.
Case-study seed data generation from events database
Gradata/src/gradata/enhancements/case_study_seed.py
Loads SQLite events, parses JSON data, identifies the top repeated correction category/pattern, collects matching RULE events and lesson injections, extracts privacy-safe before/after evidence using summary fields, and assembles a structured seed with counts, metadata, and caveats.
Case-study seed markdown rendering
Gradata/src/gradata/enhancements/case_study_seed.py
Transforms the case-study seed structure into Markdown sections covering top repeated mistake, associated rules, before/after evidence, counts, privacy notes, and review caveats.
Test helpers and case-study seed validation
Gradata/tests/test_case_study_seed_report.py
Provides SQLite fixtures to seed event records and validates seed generation (correct top-mistake identification, raw-prompt redaction), markdown output format, and end-to-end CLI JSON output.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

feature

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add case-study seed report' clearly and concisely summarizes the primary change: adding a new case-study seed report feature to the CLI.
Description check ✅ Passed The description is directly related to the changeset, detailing the new case-study seed feature, its functionality, and verification steps performed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gra-2411-case-study-seed

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.18][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label Jun 8, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/enhancements/case_study_seed.py`:
- Line 124: The seed currently sets "source_db": str(db) exposing full
filesystem paths; change this to a privacy-safe representation by replacing the
full path with a sanitized value (e.g., Path(db).name or os.path.basename(db) or
replace the user's home dir with "~") before assigning to source_db; update
imports if needed (pathlib.Path or os.path) and keep the variable names
source_db and db so the change is localized and easy to spot in
case_study_seed.py.
- Line 120: matching_injections currently only filters injection_events by
category using _category_pattern(e)[0]; narrow it to the chosen mistake pattern
as well by adding a second predicate that compares the event's pattern id/name
(from _category_pattern(e)[1]) to the selected pattern variable (e.g.,
selected_pattern or selected_mistake_pattern) so the comprehension becomes:
filter injection_events where _category_pattern(e)[0] == category AND
_category_pattern(e)[1] == selected_pattern (use the actual variable name used
in this scope).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bd8fc9f8-3d27-4d9d-9549-e21c8fc8f750

📥 Commits

Reviewing files that changed from the base of the PR and between 2b12800 and b6199ba.

📒 Files selected for processing (3)
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/enhancements/case_study_seed.py
  • Gradata/tests/test_case_study_seed_report.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: pytest (py3.12)
  • GitHub Check: pytest (py3.11)
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.12
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/enhancements/case_study_seed.py
  • Gradata/src/gradata/cli.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_case_study_seed_report.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Add unit tests in `tests/test_*.py` for every CI push without LLM calls (deterministic); mark integration tests with `pytest.mark.integration` and skip them by default (they hit real LLM APIs)

Applied to files:

  • Gradata/tests/test_case_study_seed_report.py
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

Applied to files:

  • Gradata/tests/test_case_study_seed_report.py
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Set `BRAIN_DIR` environment variable via `tmp_path` in conftest.py for test isolation — ensure `_paths.py` module cache refreshes when calling `Brain.init()` directly inside tests

Applied to files:

  • Gradata/tests/test_case_study_seed_report.py
🔇 Additional comments (2)
Gradata/src/gradata/cli.py (1)

668-681: LGTM!

Also applies to: 2121-2129

Gradata/tests/test_case_study_seed_report.py (1)

15-126: LGTM!

}
)

matching_injections = [e for e in injection_events if _category_pattern(e)[0] == category]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow injection matching to the selected mistake pattern.

matching_injections currently filters only by category, so unrelated injections in the same category inflate the seed evidence for the top pattern.

Suggested fix
-    matching_injections = [e for e in injection_events if _category_pattern(e)[0] == category]
+    matching_injections = [e for e in injection_events if _matches(e, category, pattern)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/case_study_seed.py` at line 120,
matching_injections currently only filters injection_events by category using
_category_pattern(e)[0]; narrow it to the chosen mistake pattern as well by
adding a second predicate that compares the event's pattern id/name (from
_category_pattern(e)[1]) to the selected pattern variable (e.g.,
selected_pattern or selected_mistake_pattern) so the comprehension becomes:
filter injection_events where _category_pattern(e)[0] == category AND
_category_pattern(e)[1] == selected_pattern (use the actual variable name used
in this scope).


return {
"report": "case-study-seed",
"source_db": str(db),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid exposing full local DB paths in privacy-safe output.

Line [124] includes the full filesystem path in source_db, which can leak host/user info when this seed is shared externally.

Suggested fix
-        "source_db": str(db),
+        "source_db": db.name,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/case_study_seed.py` at line 124, The seed
currently sets "source_db": str(db) exposing full filesystem paths; change this
to a privacy-safe representation by replacing the full path with a sanitized
value (e.g., Path(db).name or os.path.basename(db) or replace the user's home
dir with "~") before assigning to source_db; update imports if needed
(pathlib.Path or os.path) and keep the variable names source_db and db so the
change is localized and easy to spot in case_study_seed.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant