GitHub - lambdasec/frame: Neuro-Symbolic AI SAST: Separation Logic + LLMs

Neuro-Symbolic AI SAST: Separation Logic + LLMs

Frame is a neuro-symbolic AI SAST. Its core is a sound static-analysis engine: taint analysis plus separation-logic verification with Z3. On top of that core sits an optional LLM layer that detects vulnerabilities the symbolic engine misses and triages false positives. Frame supports 5 languages and scores 80%+ on the OWASP benchmarks, well ahead of Semgrep and Bandit. With the LLM layer on, it also finds real-world vulnerabilities that a symbolic engine and a mature pattern scanner both miss. The LLM layer works with any OpenAI-compatible endpoint and can run fully on-device. Its findings are labeled as a separate tier, so they are never mistaken for the sound symbolic results.

Highlights

Real-world security (Endor Labs corpus, 5 production apps, 193 pooled vulnerabilities):

Scanner	Recall	Precision	F1
Frame (symbolic core)	0.37	0.45	0.41
Frame (full AI SAST)	0.67	0.51	0.58
Semgrep OSS	0.52	0.40	0.45

OWASP Score (True Positive Rate - False Positive Rate) on synthetic suites:

Benchmark	Frame	Semgrep	Difference
Python (OWASP)	80.9%	4.5%	+76.4 pts
Java (OWASP)	81.5%	15.7%	+65.8 pts
JavaScript (SecBench.js)	43.0%	10.0%	+33 pts
C/C++ (NIST Juliet)	54.4%	-14.9%	+69.3 pts
C# (IssueBlot.NET)	80.3%	14.2%	+66.1 pts

_{Higher is better. See benchmarks/ for detailed methodology and results.}

Installation

git clone https://github.com/lambdasec/frame.git
cd frame
pip install -e ".[scan]"

Quick Start

# Scan for vulnerabilities
frame scan app.py

# Scan a directory
frame scan src/ --pattern "**/*.py"

# AI-assisted scan: LLM detection + triage (needs an LLM endpoint, see below)
frame scan src/ --ai

# CI/CD integration (SARIF output)
frame scan src/ --format sarif -o results.sarif --fail-on high

More examples

# Check separation logic entailment
frame solve "x |-> 5 * y |-> 3 |- x |-> 5"

# Batch check formulas
frame check formulas.txt

# Interactive mode
frame repl

Supported Languages

Language	Frameworks & Libraries
Python	Flask, Django, FastAPI, SQLAlchemy, subprocess
Java	Spring, JDBC, Hibernate, JNDI
JavaScript/TypeScript	Express, Node.js, DOM APIs
C/C++	POSIX, Windows API, memory operations
C#	ASP.NET, Entity Framework, ADO.NET

What Frame Detects

Injection & XSS

SQL Injection (CWE-89)
Cross-Site Scripting (CWE-79)
Command Injection (CWE-78)
LDAP/XPath Injection
Template Injection

Memory Safety

Buffer Overflow (CWE-121/122)
Use-After-Free (CWE-416)
Double Free (CWE-415)
Null Pointer Dereference
Integer Overflow

Data Exposure

Path Traversal (CWE-22)
SSRF (CWE-918)
Open Redirect (CWE-601)
Hardcoded Secrets
Log Injection

Cryptography

Weak Algorithms (CWE-327)
Insecure Random (CWE-330)
Weak Hashing (CWE-328)
Insecure Deserialization

How It Works

Frame combines taint analysis with separation logic verification:

Source Code
     |
     v
[Language Frontend] ---> SIL (Separation Intermediate Language)
     |                         |
     v                         v
[Taint Tracking]        [Symbolic Execution]
     |                         |
     v                         v
[Pattern Detection] <---> [Z3 Verification]
     |
     v
Vulnerability Report

Why this matters:

Taint analysis tracks untrusted data flow from sources (user input) to sinks (SQL queries)
Separation logic formally verifies memory safety properties
Z3 verification eliminates false positives by proving vulnerability reachability

AI-Assisted Detection & Triage (optional)

Frame's symbolic core is sound and precise. But structural analysis can't reach everything: context-dependent flows, unknown frameworks, business logic. Frame adds an optional layer driven by an LLM.

Detect (recall): find vulnerabilities the symbolic engine misses. It can explore across files, calling read_file/grep tools over your repo to trace a flow from one file into another.
Triage (precision): drop confident false positives from the findings.
Verify: each LLM finding is checked against Frame's own sink model. A finding grounded in a recognized sink, cross-file included, moves up to a higher-confidence tier (llm_verified). Symbolic results and LLM results are never conflated.

On the Endor Labs public AI-SAST corpus (5 real-world apps), Frame's full mode (detection + triage) reaches 0.67 recall at 0.51 precision, or 0.71 recall with detection alone. Semgrep gets 0.52 recall at 0.40 precision. The LLM layer recovers around 65 real vulnerabilities across Java, JS/TS, and C# that both Frame's symbolic engine and Semgrep miss. See the benchmark README for the full scoreboard and the honest caveats.

The layer works with any OpenAI-compatible endpoint, so you can point it at a frontier hosted model or a local one. Our results use a local model, for privacy and cost: mlx-optiq serving mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit on Apple Silicon. A stronger hosted model would likely do better. Both layers are off by default; without them you get the sound symbolic core.

# our local setup (Apple Silicon): serve the model, then point Frame at it
pip install mlx-optiq
optiq kv-cache mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit --target-bits 5.0 -o ./kv
optiq serve --model mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit \
  --kv-config ./kv/kv_config.json --port 47317 --mtp    # --mtp: ~1.4x faster decode

export FRAME_LLM_BASE_URL=http://localhost:47317/v1
export FRAME_LLM_API_KEY=                                  # empty for local servers
export FRAME_LLM_MODEL=mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit
export FRAME_LLM_REPO_ROOT=/path/to/repo                   # enables agentic cross-file tools

Then turn the layer on with one flag:

frame scan src/ --ai          # symbolic + LLM detection + triage

Or from the Python API:

from frame.sil import FrameScanner
# symbolic + LLM detection + triage (reads the FRAME_LLM_* env above)
scanner = FrameScanner(language="java", llm_detect=True, llm_triage=True)
result = scanner.scan_file("Controller.java")

CI/CD Integration

# GitHub Actions
- name: Install Frame
  run: pip install -e ".[scan]"

- name: Security Scan
  run: frame scan src/ --format sarif -o results.sarif --fail-on high

- name: Upload Results
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: results.sarif

Python API

from frame import EntailmentChecker
from frame.sil import FrameScanner

# Security scanning
scanner = FrameScanner()
result = scanner.scan_file("app.py")
for vuln in result.vulnerabilities:
    print(f"{vuln.cwe_id}: {vuln.description}")

# Separation logic verification
checker = EntailmentChecker()
result = checker.check_entailment("x |-> 5 * y |-> 3 |- x |-> 5")
print(result.valid)  # True

Separation Logic Solver

Frame includes a separation logic solver for verifying heap properties:

Syntax	Meaning
`x \|-> v`	x points to value v
`emp`	Empty heap
`P * Q`	P and Q in separate memory
`P -* Q`	Magic wand
`P \|- Q`	P entails Q

Built-in predicates: ls(x,y) (list segment), list(x), tree(x), dll(x,p,y,n)

frame solve "ls(x, y) * ls(y, z) |- ls(x, z)"  # List transitivity

Benchmarks

Frame is validated against industry-standard benchmark suites:

Benchmark	Domain	Tests	Precision	Recall
OWASP Python	Web Security	500	95.3%	83.5%
OWASP Java	Web Security	500	97.2%	84.8%
SecBench.js	Node.js Security	300	82.0%	81.0%
NIST Juliet	C/C++ Memory	1,000	89.9%	60.5%
IssueBlot.NET	C# Security	171	84.7%	80.3%
SL-COMP	Separation Logic	692	79.9%	n/a¹
SMT-LIB QF_S	String Theory	3,300	99.3%	n/a¹

python -m benchmarks run --curated  # Run all benchmarks

_{¹ SL-COMP and QF_S are logic-solver suites: the percentage is solver accuracy, and recall does not apply.}

Beyond the synthetic suites, Frame is scored on the Endor Labs public AI-SAST corpus: 5 production applications. With the LLM layer, Frame reaches 0.67 recall at 0.51 precision, against Semgrep's 0.52 and 0.40. It finds around 65 real vulnerabilities across Java, JS/TS, and C# that both a symbolic engine and Semgrep miss. The benchmark README records how the ground truth was built and the honest caveats.

See benchmarks/README.md for detailed results, methodology, and tool comparisons.

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
.github/workflows		.github/workflows
assets		assets
benchmarks		benchmarks
frame		frame
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Highlights

Installation

Quick Start

Supported Languages

What Frame Detects

How It Works

AI-Assisted Detection & Triage (optional)

CI/CD Integration

Python API

Separation Logic Solver

Benchmarks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Highlights

Installation

Quick Start

Supported Languages

What Frame Detects

How It Works

AI-Assisted Detection & Triage (optional)

CI/CD Integration

Python API

Separation Logic Solver

Benchmarks

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages