Skip to content

lambdasec/frame

Repository files navigation

Frame

Neuro-Symbolic AI SAST: Separation Logic + LLMs

Tests Python License


Frame is a neuro-symbolic AI SAST. Its core is a sound static-analysis engine: taint analysis plus separation-logic verification with Z3. On top of that core sits an optional LLM layer that detects vulnerabilities the symbolic engine misses and triages false positives. Frame supports 5 languages and scores 80%+ on the OWASP benchmarks, well ahead of Semgrep and Bandit. With the LLM layer on, it also finds real-world vulnerabilities that a symbolic engine and a mature pattern scanner both miss. The LLM layer works with any OpenAI-compatible endpoint and can run fully on-device. Its findings are labeled as a separate tier, so they are never mistaken for the sound symbolic results.

Highlights

Real-world security (Endor Labs corpus, 5 production apps, 193 pooled vulnerabilities):

Scanner Recall Precision F1
Frame (symbolic core) 0.37 0.45 0.41
Frame (full AI SAST) 0.67 0.51 0.58
Semgrep OSS 0.52 0.40 0.45

OWASP Score (True Positive Rate - False Positive Rate) on synthetic suites:

Benchmark Frame Semgrep Difference
Python (OWASP) 80.9% 4.5% +76.4 pts
Java (OWASP) 81.5% 15.7% +65.8 pts
JavaScript (SecBench.js) 43.0% 10.0% +33 pts
C/C++ (NIST Juliet) 54.4% -14.9% +69.3 pts
C# (IssueBlot.NET) 80.3% 14.2% +66.1 pts

Higher is better. See benchmarks/ for detailed methodology and results.

Installation

git clone https://github.com/lambdasec/frame.git
cd frame
pip install -e ".[scan]"

Quick Start

# Scan for vulnerabilities
frame scan app.py

# Scan a directory
frame scan src/ --pattern "**/*.py"

# AI-assisted scan: LLM detection + triage (needs an LLM endpoint, see below)
frame scan src/ --ai

# CI/CD integration (SARIF output)
frame scan src/ --format sarif -o results.sarif --fail-on high
More examples
# Check separation logic entailment
frame solve "x |-> 5 * y |-> 3 |- x |-> 5"

# Batch check formulas
frame check formulas.txt

# Interactive mode
frame repl

Supported Languages

Language Frameworks & Libraries
Python Flask, Django, FastAPI, SQLAlchemy, subprocess
Java Spring, JDBC, Hibernate, JNDI
JavaScript/TypeScript Express, Node.js, DOM APIs
C/C++ POSIX, Windows API, memory operations
C# ASP.NET, Entity Framework, ADO.NET

What Frame Detects

Injection & XSS

  • SQL Injection (CWE-89)
  • Cross-Site Scripting (CWE-79)
  • Command Injection (CWE-78)
  • LDAP/XPath Injection
  • Template Injection

Memory Safety

  • Buffer Overflow (CWE-121/122)
  • Use-After-Free (CWE-416)
  • Double Free (CWE-415)
  • Null Pointer Dereference
  • Integer Overflow

Data Exposure

  • Path Traversal (CWE-22)
  • SSRF (CWE-918)
  • Open Redirect (CWE-601)
  • Hardcoded Secrets
  • Log Injection

Cryptography

  • Weak Algorithms (CWE-327)
  • Insecure Random (CWE-330)
  • Weak Hashing (CWE-328)
  • Insecure Deserialization

How It Works

Frame combines taint analysis with separation logic verification:

Source Code
     |
     v
[Language Frontend] ---> SIL (Separation Intermediate Language)
     |                         |
     v                         v
[Taint Tracking]        [Symbolic Execution]
     |                         |
     v                         v
[Pattern Detection] <---> [Z3 Verification]
     |
     v
Vulnerability Report

Why this matters:

  • Taint analysis tracks untrusted data flow from sources (user input) to sinks (SQL queries)
  • Separation logic formally verifies memory safety properties
  • Z3 verification eliminates false positives by proving vulnerability reachability

AI-Assisted Detection & Triage (optional)

Frame's symbolic core is sound and precise. But structural analysis can't reach everything: context-dependent flows, unknown frameworks, business logic. Frame adds an optional layer driven by an LLM.

  • Detect (recall): find vulnerabilities the symbolic engine misses. It can explore across files, calling read_file/grep tools over your repo to trace a flow from one file into another.
  • Triage (precision): drop confident false positives from the findings.
  • Verify: each LLM finding is checked against Frame's own sink model. A finding grounded in a recognized sink, cross-file included, moves up to a higher-confidence tier (llm_verified). Symbolic results and LLM results are never conflated.

On the Endor Labs public AI-SAST corpus (5 real-world apps), Frame's full mode (detection + triage) reaches 0.67 recall at 0.51 precision, or 0.71 recall with detection alone. Semgrep gets 0.52 recall at 0.40 precision. The LLM layer recovers around 65 real vulnerabilities across Java, JS/TS, and C# that both Frame's symbolic engine and Semgrep miss. See the benchmark README for the full scoreboard and the honest caveats.

The layer works with any OpenAI-compatible endpoint, so you can point it at a frontier hosted model or a local one. Our results use a local model, for privacy and cost: mlx-optiq serving mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit on Apple Silicon. A stronger hosted model would likely do better. Both layers are off by default; without them you get the sound symbolic core.

# our local setup (Apple Silicon): serve the model, then point Frame at it
pip install mlx-optiq
optiq kv-cache mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit --target-bits 5.0 -o ./kv
optiq serve --model mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit \
  --kv-config ./kv/kv_config.json --port 47317 --mtp    # --mtp: ~1.4x faster decode

export FRAME_LLM_BASE_URL=http://localhost:47317/v1
export FRAME_LLM_API_KEY=                                  # empty for local servers
export FRAME_LLM_MODEL=mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit
export FRAME_LLM_REPO_ROOT=/path/to/repo                   # enables agentic cross-file tools

Then turn the layer on with one flag:

frame scan src/ --ai          # symbolic + LLM detection + triage

Or from the Python API:

from frame.sil import FrameScanner
# symbolic + LLM detection + triage (reads the FRAME_LLM_* env above)
scanner = FrameScanner(language="java", llm_detect=True, llm_triage=True)
result = scanner.scan_file("Controller.java")

CI/CD Integration

# GitHub Actions
- name: Install Frame
  run: pip install -e ".[scan]"

- name: Security Scan
  run: frame scan src/ --format sarif -o results.sarif --fail-on high

- name: Upload Results
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: results.sarif

Python API

from frame import EntailmentChecker
from frame.sil import FrameScanner

# Security scanning
scanner = FrameScanner()
result = scanner.scan_file("app.py")
for vuln in result.vulnerabilities:
    print(f"{vuln.cwe_id}: {vuln.description}")

# Separation logic verification
checker = EntailmentChecker()
result = checker.check_entailment("x |-> 5 * y |-> 3 |- x |-> 5")
print(result.valid)  # True

Separation Logic Solver

Frame includes a separation logic solver for verifying heap properties:

Syntax Meaning
x |-> v x points to value v
emp Empty heap
P * Q P and Q in separate memory
P -* Q Magic wand
P |- Q P entails Q

Built-in predicates: ls(x,y) (list segment), list(x), tree(x), dll(x,p,y,n)

frame solve "ls(x, y) * ls(y, z) |- ls(x, z)"  # List transitivity

Benchmarks

Frame is validated against industry-standard benchmark suites:

Benchmark Domain Tests Precision Recall
OWASP Python Web Security 500 95.3% 83.5%
OWASP Java Web Security 500 97.2% 84.8%
SecBench.js Node.js Security 300 82.0% 81.0%
NIST Juliet C/C++ Memory 1,000 89.9% 60.5%
IssueBlot.NET C# Security 171 84.7% 80.3%
SL-COMP Separation Logic 692 79.9% n/a¹
SMT-LIB QF_S String Theory 3,300 99.3% n/a¹
python -m benchmarks run --curated  # Run all benchmarks

¹ SL-COMP and QF_S are logic-solver suites: the percentage is solver accuracy, and recall does not apply.

Beyond the synthetic suites, Frame is scored on the Endor Labs public AI-SAST corpus: 5 production applications. With the LLM layer, Frame reaches 0.67 recall at 0.51 precision, against Semgrep's 0.52 and 0.40. It finds around 65 real vulnerabilities across Java, JS/TS, and C# that both a symbolic engine and Semgrep miss. The benchmark README records how the ground truth was built and the honest caveats.

See benchmarks/README.md for detailed results, methodology, and tool comparisons.