feat: add case-study seed report#273
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 Walkthrough
WalkthroughThis PR introduces a new "case-study-seed" report type to the Gradata CLI. It loads events from a SQLite database, identifies the top repeated correction pattern, collects related rules and evidence, and renders the result as privacy-safe JSON or Markdown. Tests validate seed generation, evidence redaction, markdown output, and CLI integration. ChangesCase-study seed report generation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.22.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/src/gradata/enhancements/case_study_seed.py`:
- Line 124: The seed currently sets "source_db": str(db) exposing full
filesystem paths; change this to a privacy-safe representation by replacing the
full path with a sanitized value (e.g., Path(db).name or os.path.basename(db) or
replace the user's home dir with "~") before assigning to source_db; update
imports if needed (pathlib.Path or os.path) and keep the variable names
source_db and db so the change is localized and easy to spot in
case_study_seed.py.
- Line 120: matching_injections currently only filters injection_events by
category using _category_pattern(e)[0]; narrow it to the chosen mistake pattern
as well by adding a second predicate that compares the event's pattern id/name
(from _category_pattern(e)[1]) to the selected pattern variable (e.g.,
selected_pattern or selected_mistake_pattern) so the comprehension becomes:
filter injection_events where _category_pattern(e)[0] == category AND
_category_pattern(e)[1] == selected_pattern (use the actual variable name used
in this scope).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bd8fc9f8-3d27-4d9d-9549-e21c8fc8f750
📒 Files selected for processing (3)
Gradata/src/gradata/cli.pyGradata/src/gradata/enhancements/case_study_seed.pyGradata/tests/test_case_study_seed_report.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: pytest (py3.12)
- GitHub Check: pytest (py3.11)
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.11
- GitHub Check: pytest windows-latest / py3.12
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/enhancements/case_study_seed.pyGradata/src/gradata/cli.py
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_case_study_seed_report.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Add unit tests in `tests/test_*.py` for every CI push without LLM calls (deterministic); mark integration tests with `pytest.mark.integration` and skip them by default (they hit real LLM APIs)
Applied to files:
Gradata/tests/test_case_study_seed_report.py
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/tests/test_case_study_seed_report.py
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Set `BRAIN_DIR` environment variable via `tmp_path` in conftest.py for test isolation — ensure `_paths.py` module cache refreshes when calling `Brain.init()` directly inside tests
Applied to files:
Gradata/tests/test_case_study_seed_report.py
🔇 Additional comments (2)
Gradata/src/gradata/cli.py (1)
668-681: LGTM!Also applies to: 2121-2129
Gradata/tests/test_case_study_seed_report.py (1)
15-126: LGTM!
| } | ||
| ) | ||
|
|
||
| matching_injections = [e for e in injection_events if _category_pattern(e)[0] == category] |
There was a problem hiding this comment.
Narrow injection matching to the selected mistake pattern.
matching_injections currently filters only by category, so unrelated injections in the same category inflate the seed evidence for the top pattern.
Suggested fix
- matching_injections = [e for e in injection_events if _category_pattern(e)[0] == category]
+ matching_injections = [e for e in injection_events if _matches(e, category, pattern)]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/case_study_seed.py` at line 120,
matching_injections currently only filters injection_events by category using
_category_pattern(e)[0]; narrow it to the chosen mistake pattern as well by
adding a second predicate that compares the event's pattern id/name (from
_category_pattern(e)[1]) to the selected pattern variable (e.g.,
selected_pattern or selected_mistake_pattern) so the comprehension becomes:
filter injection_events where _category_pattern(e)[0] == category AND
_category_pattern(e)[1] == selected_pattern (use the actual variable name used
in this scope).
|
|
||
| return { | ||
| "report": "case-study-seed", | ||
| "source_db": str(db), |
There was a problem hiding this comment.
Avoid exposing full local DB paths in privacy-safe output.
Line [124] includes the full filesystem path in source_db, which can leak host/user info when this seed is shared externally.
Suggested fix
- "source_db": str(db),
+ "source_db": db.name,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/case_study_seed.py` at line 124, The seed
currently sets "source_db": str(db) exposing full filesystem paths; change this
to a privacy-safe representation by replacing the full path with a sanitized
value (e.g., Path(db).name or os.path.basename(db) or replace the user's home
dir with "~") before assigning to source_db; update imports if needed
(pathlib.Path or os.path) and keep the variable names source_db and db so the
change is localized and easy to spot in case_study_seed.py.
Summary
gradata report case-study-seedwith Markdown output and--jsonsupport.system.dbcorrections, RULE_GRADUATED rules, and injection/application events.Verification
env -u BRAIN_DIR -u GRADATA_BRAIN python3 -m pytest tests/test_case_study_seed_report.py tests/test_status_command.py tests/test_prove_command.py -q→ 14 passed/home/olive/.local/bin/uvx ruff check src/gradata/enhancements/case_study_seed.py tests/test_case_study_seed_report.py src/gradata/cli.py→ All checks passedenv -u BRAIN_DIR -u GRADATA_BRAIN PYTHONPATH=src python3 -m gradata.cli --brain-dir "$tmp" report case-study-seed --json→ patternInvented API fields, raw_prompt_content_includedFalsePaperclip: GRA-2411