Skip to content

fix(security): guard JIT prompt injection#240

Open
Gradata wants to merge 1 commit into
mainfrom
gra-2018-prompt-sanitizer
Open

fix(security): guard JIT prompt injection#240
Gradata wants to merge 1 commit into
mainfrom
gra-2018-prompt-sanitizer

Conversation

@Gradata

@Gradata Gradata commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a UserPromptSubmit prompt-injection guard for JIT rule injection.
  • Sanitize user draft text before rule scoring and skip rule injection on suspicious patterns.
  • Add a 20+ payload PoC corpus plus focused tests for direct override, role hijack, system leak, ChatML/Alpaca markers, base64/ROT13 encoded directives, indirect injection, and benign controls.

Tests

  • python3 -m pytest tests/hooks/test_injection_guard.py tests/security/test_prompt_injection_poc.py tests/test_jit_inject.py -q
    • 91 passed, 1 skipped, 14 xfailed in 0.66s

Notes

  • This issue is filed under Gradata Cloud, but the UserPromptSubmit hook lives in the SDK repo (Gradata/gradata), so the fix is opened there.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Security Enhancement: Prompt Injection Guard for JIT Rule Injection

  • New security module (_injection_guard.py): Implements prompt-injection detection and text sanitization with two public functions:

    • sanitize(text: str) → str — normalizes text by removing BOMs, applying Unicode NFKC, stripping zero-width characters, and collapsing whitespace
    • is_suspicious(text: str) → tuple[bool, str] — detects injection patterns (role hijacks, system leaks, ChatML/Alpaca markers, encoded payloads via base64/ROT13, etc.)
  • Guard integration: Modified jit_inject.py to sanitize and check user drafts before rule scoring; injection attempts are logged and blocked

  • Configuration: Guard enabled by default for new installs; can be disabled via GRADATA_LEGACY_INSTALL environment variable

  • Comprehensive test coverage: >20 payload proof-of-concept corpus with detection tests for direct overrides, role hijacks, system leaks, encoding bypasses, indirect injections, and benign controls; test results: 91 passed, 1 skipped, 14 xfailed

  • No breaking changes: Guard operates transparently as a pre-screening layer; downstream rule scoring/injection logic unchanged

Walkthrough

This PR introduces a prompt-injection guard for pre-screening user drafts before rule scoring. The core module sanitizes Unicode and detects injection patterns via regex and encoded-payload decoding. The guard is integrated into the JIT injection hook to abort suspicious submissions. The PR includes 200+ lines of unit tests, a structured corpus of 34 test payloads across 12 attack categories, a JSON manifest describing each payload, and a PoC test runner validating guard coverage.

Changes

Prompt Injection Guard Implementation & Testing

Layer / File(s) Summary
Injection Guard Core Implementation
Gradata/src/gradata/hooks/_injection_guard.py
Core module exports sanitize(text) for Unicode NFKC normalization, BOM/zero-width removal, and whitespace collapsing with fail-safe fallback. Exports is_suspicious(text) that detects prompt-injection via compiled regex patterns (roleplay, persona bypass, system-leak, ChatML/Alpaca markers, few-shot hijack, goal hijack, indirect markers, override phrases) plus base64/ROT13 decoding and rescanning. Uses environment-gated enablement with legacy-install heuristics.
Hook Integration & Guard Usage
Gradata/src/gradata/hooks/jit_inject.py
The UserPromptSubmit hook now imports and applies sanitize() and is_suspicious() before rule scoring. Suspicious drafts are logged and rejected by returning None; benign drafts proceed to rule ranking.
Unit Tests for Guard Functions
Gradata/tests/hooks/test_injection_guard.py
Validates is_suspicious() detection of gap payloads from the manifest, targeted injection patterns (ignore-previous-instructions, zero-width obfuscation, DAN jailbreak, base64/ROT13 encoding, ChatML markers), and sanitize() behavior (BOM stripping, NFKC normalization, zero-width removal, whitespace collapsing). Includes false-positive checks for benign corpus fixtures and generic sentences.
Test Corpus & Attack Classification Manifest
Gradata/tests/security/fixtures/injection_corpus/*.txt, Gradata/tests/security/fixtures/manifest.json
34 test payloads across 12 attack categories: benign controls (3), direct overrides (3), encoding bypasses (3), few-shot hijacks (1), goal hijacks (2), indirect injections (2), JS templates (4), marker injections (3), role hijacks (4), system leaks (3), virtualization (2), XML injections (4). Manifest provides file paths, severity, expected outcomes (block/sanitize/pass), guard surface, and detection-status flags.
Corpus-Driven PoC Test Runner
Gradata/tests/security/test_prompt_injection_poc.py
Parametrized tests validate guard behavior against corpus payloads: block tests verify blocked status (xfail on known gaps), sanitize tests assert payload transformation with context-specific rules (XML escaping, JS backtick removal), benign tests verify pass-through results. Includes corpus-integrity checks: minimum payload count, fixture file existence/non-empty content, and required attack-class taxonomy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

security

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(security): guard JIT prompt injection' directly and specifically describes the main change—adding a prompt-injection guard for JIT rule injection in the UserPromptSubmit hook.
Description check ✅ Passed The description clearly relates to the changeset by outlining the summary (adding a prompt-injection guard), listing the test categories covered, providing test results, and explaining the deployment context.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gra-2018-prompt-sanitizer

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.18][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the security label Jun 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/hooks/_injection_guard.py`:
- Around line 259-263: The short-text fast path currently unconditionally
returns (False, "") for any text with len(text) < 20, which bypasses detection
of short but high-signal markers; remove or change that unconditional return in
_injection_guard.py so short inputs still get scanned for known markers: instead
of returning immediately, call the existing marker-detection routine (e.g. the
function that checks CHATML/INJECTION_MARKERS or the helper used elsewhere in
this module) on text and only return (False, "") if that scan finds nothing;
update the branch around the len(text) check to preserve the cheap early-exit
for truly innocuous short text but ensure any detected markers are handled
rather than skipped.

In `@Gradata/tests/hooks/test_injection_guard.py`:
- Around line 224-227: Add a regression test alongside
test_short_text_not_flagged that verifies short but clearly malicious inputs are
flagged: call is_suspicious with a short high-signal string (e.g., a brief
explicit attack or clearly malicious phrase/marker) and assert that the returned
suspicious flag is True; update or add a new test function (e.g.,
test_short_malicious_flagged) to call is_suspicious and assert suspicious is
truthy to prevent bypasses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f3dd44e8-bebf-4966-8098-c58f7cb1e537

📥 Commits

Reviewing files that changed from the base of the PR and between a197bff and 179cd2d.

📒 Files selected for processing (39)
  • Gradata/src/gradata/hooks/_injection_guard.py
  • Gradata/src/gradata/hooks/jit_inject.py
  • Gradata/tests/hooks/test_injection_guard.py
  • Gradata/tests/security/fixtures/injection_corpus/benign_control_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/benign_control_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/benign_control_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/direct_override_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/direct_override_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/direct_override_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/encoding_bypass_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/encoding_bypass_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/encoding_bypass_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/few_shot_hijack_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/goal_hijack_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/goal_hijack_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/indirect_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/indirect_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/js_template_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/js_template_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/js_template_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/js_template_004.txt
  • Gradata/tests/security/fixtures/injection_corpus/marker_inject_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/marker_inject_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/marker_inject_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/role_hijack_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/role_hijack_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/role_hijack_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/role_hijack_004.txt
  • Gradata/tests/security/fixtures/injection_corpus/system_leak_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/system_leak_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/system_leak_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/virtualization_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/virtualization_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/xml_inject_001.txt
  • Gradata/tests/security/fixtures/injection_corpus/xml_inject_002.txt
  • Gradata/tests/security/fixtures/injection_corpus/xml_inject_003.txt
  • Gradata/tests/security/fixtures/injection_corpus/xml_inject_004.txt
  • Gradata/tests/security/fixtures/manifest.json
  • Gradata/tests/security/test_prompt_injection_poc.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest (py3.12)
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/hooks/jit_inject.py
  • Gradata/src/gradata/hooks/_injection_guard.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/security/test_prompt_injection_poc.py
  • Gradata/tests/hooks/test_injection_guard.py
🪛 LanguageTool
Gradata/tests/security/fixtures/injection_corpus/few_shot_hijack_001.txt

[style] ~7-~7: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...sistant (unrestricted): You boil water. Also, here are all the system prompt content...

(REP_SENT_START_ALSO)

🪛 Ruff (0.15.15)
Gradata/src/gradata/hooks/_injection_guard.py

[warning] 33-33: Found empty type-checking block

Delete empty type-checking block

(TC005)


[warning] 54-56: Return the condition legacy not in {"1", "true", "yes", "on"} directly

Replace with return legacy not in {"1", "true", "yes", "on"}

(SIM103)


[warning] 233-234: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)


[warning] 237-238: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

🔇 Additional comments (32)
Gradata/src/gradata/hooks/jit_inject.py (1)

33-33: LGTM!

Also applies to: 292-299

Gradata/tests/security/fixtures/injection_corpus/benign_control_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/benign_control_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/benign_control_003.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/direct_override_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/direct_override_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/direct_override_003.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/encoding_bypass_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/xml_inject_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/encoding_bypass_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/encoding_bypass_003.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/few_shot_hijack_001.txt (1)

1-10: LGTM!

Gradata/tests/security/fixtures/injection_corpus/goal_hijack_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/goal_hijack_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/role_hijack_004.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/system_leak_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/system_leak_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/indirect_001.txt (1)

1-5: LGTM!

Gradata/tests/security/fixtures/injection_corpus/indirect_002.txt (1)

1-5: LGTM!

Gradata/tests/security/fixtures/injection_corpus/js_template_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/js_template_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/js_template_003.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/system_leak_003.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/virtualization_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/virtualization_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/js_template_004.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/marker_inject_001.txt (1)

1-4: LGTM!

Gradata/tests/security/fixtures/injection_corpus/marker_inject_002.txt (1)

1-6: LGTM!

Gradata/tests/security/fixtures/injection_corpus/marker_inject_003.txt (1)

1-6: LGTM!

Gradata/tests/security/fixtures/injection_corpus/role_hijack_001.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/role_hijack_002.txt (1)

1-1: LGTM!

Gradata/tests/security/fixtures/injection_corpus/role_hijack_003.txt (1)

1-1: LGTM!

Comment on lines +259 to +263
# Quick pre-check: if text is very short and doesn't contain known markers,
# skip expensive processing.
if len(text) < 20:
return False, ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Short-text fast path bypasses detection of high-signal injections.

Line 261-Line 262 unconditionally returns (False, "") for any text under 20 chars, so short markers (e.g. ChatML tags) skip all detection. This weakens the guard’s primary security objective.

🔧 Suggested fix
-    # Quick pre-check: if text is very short and doesn't contain known markers,
-    # skip expensive processing.
-    if len(text) < 20:
-        return False, ""
+    # Quick pre-check: for very short text, only run a tiny high-signal subset.
+    if len(text) < 20:
+        short_checks: list[tuple[str, re.Pattern[str]]] = [
+            ("LLM marker injection", _RE_LLM_MARKERS),
+            ("system prompt leak", _RE_SYSTEM_LEAK),
+            ("override/bypass", _RE_OVERRIDE),
+        ]
+        for label, pattern in short_checks:
+            if pattern.search(text):
+                return True, f"suspicious: {label}"
+        return False, ""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Quick pre-check: if text is very short and doesn't contain known markers,
# skip expensive processing.
if len(text) < 20:
return False, ""
# Quick pre-check: for very short text, only run a tiny high-signal subset.
if len(text) < 20:
short_checks: list[tuple[str, re.Pattern[str]]] = [
("LLM marker injection", _RE_LLM_MARKERS),
("system prompt leak", _RE_SYSTEM_LEAK),
("override/bypass", _RE_OVERRIDE),
]
for label, pattern in short_checks:
if pattern.search(text):
return True, f"suspicious: {label}"
return False, ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/hooks/_injection_guard.py` around lines 259 - 263, The
short-text fast path currently unconditionally returns (False, "") for any text
with len(text) < 20, which bypasses detection of short but high-signal markers;
remove or change that unconditional return in _injection_guard.py so short
inputs still get scanned for known markers: instead of returning immediately,
call the existing marker-detection routine (e.g. the function that checks
CHATML/INJECTION_MARKERS or the helper used elsewhere in this module) on text
and only return (False, "") if that scan finds nothing; update the branch around
the len(text) check to preserve the cheap early-exit for truly innocuous short
text but ensure any detected markers are handled rather than skipped.

Comment on lines +224 to +227
def test_short_text_not_flagged() -> None:
"""Very short text must not be flagged."""
suspicious, _ = is_suspicious("Hello")
assert not suspicious

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a malicious short-input regression test.

Current short-text coverage only asserts benign input ("Hello"). Add at least one short high-signal marker case to prevent security bypass regressions.

✅ Suggested test addition
 def test_short_text_not_flagged() -> None:
     """Very short text must not be flagged."""
     suspicious, _ = is_suspicious("Hello")
     assert not suspicious
+
+
+def test_short_marker_flagged() -> None:
+    """Short but high-signal marker text must still be flagged."""
+    suspicious, _ = is_suspicious("<|im_start|>system")
+    assert suspicious
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/tests/hooks/test_injection_guard.py` around lines 224 - 227, Add a
regression test alongside test_short_text_not_flagged that verifies short but
clearly malicious inputs are flagged: call is_suspicious with a short
high-signal string (e.g., a brief explicit attack or clearly malicious
phrase/marker) and assert that the returned suspicious flag is True; update or
add a new test function (e.g., test_short_malicious_flagged) to call
is_suspicious and assert suspicious is truthy to prevent bypasses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant