Neuro-Symbolic AI SAST: Separation Logic + LLMs
Frame is a neuro-symbolic AI SAST. Its core is a sound static-analysis engine: taint analysis plus separation-logic verification with Z3. On top of that core sits an optional LLM layer that detects vulnerabilities the symbolic engine misses and triages false positives. Frame supports 5 languages and scores 80%+ on the OWASP benchmarks, well ahead of Semgrep and Bandit. With the LLM layer on, it also finds real-world vulnerabilities that a symbolic engine and a mature pattern scanner both miss. The LLM layer works with any OpenAI-compatible endpoint and can run fully on-device. Its findings are labeled as a separate tier, so they are never mistaken for the sound symbolic results.
Real-world security (Endor Labs corpus, 5 production apps, 193 pooled vulnerabilities):
| Scanner | Recall | Precision | F1 |
|---|---|---|---|
| Frame (symbolic core) | 0.37 | 0.45 | 0.41 |
| Frame (full AI SAST) | 0.67 | 0.51 | 0.58 |
| Semgrep OSS | 0.52 | 0.40 | 0.45 |
OWASP Score (True Positive Rate - False Positive Rate) on synthetic suites:
| Benchmark | Frame | Semgrep | Difference |
|---|---|---|---|
| Python (OWASP) | 80.9% | 4.5% | +76.4 pts |
| Java (OWASP) | 81.5% | 15.7% | +65.8 pts |
| JavaScript (SecBench.js) | 43.0% | 10.0% | +33 pts |
| C/C++ (NIST Juliet) | 54.4% | -14.9% | +69.3 pts |
| C# (IssueBlot.NET) | 80.3% | 14.2% | +66.1 pts |
Higher is better. See benchmarks/ for detailed methodology and results.
git clone https://github.com/lambdasec/frame.git
cd frame
pip install -e ".[scan]"# Scan for vulnerabilities
frame scan app.py
# Scan a directory
frame scan src/ --pattern "**/*.py"
# AI-assisted scan: LLM detection + triage (needs an LLM endpoint, see below)
frame scan src/ --ai
# CI/CD integration (SARIF output)
frame scan src/ --format sarif -o results.sarif --fail-on highMore examples
# Check separation logic entailment
frame solve "x |-> 5 * y |-> 3 |- x |-> 5"
# Batch check formulas
frame check formulas.txt
# Interactive mode
frame repl| Language | Frameworks & Libraries |
|---|---|
| Python | Flask, Django, FastAPI, SQLAlchemy, subprocess |
| Java | Spring, JDBC, Hibernate, JNDI |
| JavaScript/TypeScript | Express, Node.js, DOM APIs |
| C/C++ | POSIX, Windows API, memory operations |
| C# | ASP.NET, Entity Framework, ADO.NET |
|
Injection & XSS
|
Memory Safety
|
|
Data Exposure
|
Cryptography
|
Frame combines taint analysis with separation logic verification:
Source Code
|
v
[Language Frontend] ---> SIL (Separation Intermediate Language)
| |
v v
[Taint Tracking] [Symbolic Execution]
| |
v v
[Pattern Detection] <---> [Z3 Verification]
|
v
Vulnerability Report
Why this matters:
- Taint analysis tracks untrusted data flow from sources (user input) to sinks (SQL queries)
- Separation logic formally verifies memory safety properties
- Z3 verification eliminates false positives by proving vulnerability reachability
Frame's symbolic core is sound and precise. But structural analysis can't reach everything: context-dependent flows, unknown frameworks, business logic. Frame adds an optional layer driven by an LLM.
- Detect (recall): find vulnerabilities the symbolic engine misses. It can explore across files, calling
read_file/greptools over your repo to trace a flow from one file into another. - Triage (precision): drop confident false positives from the findings.
- Verify: each LLM finding is checked against Frame's own sink model. A finding grounded in a recognized sink, cross-file included, moves up to a higher-confidence tier (
llm_verified). Symbolic results and LLM results are never conflated.
On the Endor Labs public AI-SAST corpus (5 real-world apps), Frame's full mode (detection + triage) reaches 0.67 recall at 0.51 precision, or 0.71 recall with detection alone. Semgrep gets 0.52 recall at 0.40 precision. The LLM layer recovers around 65 real vulnerabilities across Java, JS/TS, and C# that both Frame's symbolic engine and Semgrep miss. See the benchmark README for the full scoreboard and the honest caveats.
The layer works with any OpenAI-compatible endpoint, so you can point it at a frontier hosted model or a local one. Our results use a local model, for privacy and cost: mlx-optiq serving mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit on Apple Silicon. A stronger hosted model would likely do better. Both layers are off by default; without them you get the sound symbolic core.
# our local setup (Apple Silicon): serve the model, then point Frame at it
pip install mlx-optiq
optiq kv-cache mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit --target-bits 5.0 -o ./kv
optiq serve --model mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit \
--kv-config ./kv/kv_config.json --port 47317 --mtp # --mtp: ~1.4x faster decode
export FRAME_LLM_BASE_URL=http://localhost:47317/v1
export FRAME_LLM_API_KEY= # empty for local servers
export FRAME_LLM_MODEL=mlx-community/Qwen3.6-35B-A3B-OptiQ-4bit
export FRAME_LLM_REPO_ROOT=/path/to/repo # enables agentic cross-file toolsThen turn the layer on with one flag:
frame scan src/ --ai # symbolic + LLM detection + triageOr from the Python API:
from frame.sil import FrameScanner
# symbolic + LLM detection + triage (reads the FRAME_LLM_* env above)
scanner = FrameScanner(language="java", llm_detect=True, llm_triage=True)
result = scanner.scan_file("Controller.java")# GitHub Actions
- name: Install Frame
run: pip install -e ".[scan]"
- name: Security Scan
run: frame scan src/ --format sarif -o results.sarif --fail-on high
- name: Upload Results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: results.sariffrom frame import EntailmentChecker
from frame.sil import FrameScanner
# Security scanning
scanner = FrameScanner()
result = scanner.scan_file("app.py")
for vuln in result.vulnerabilities:
print(f"{vuln.cwe_id}: {vuln.description}")
# Separation logic verification
checker = EntailmentChecker()
result = checker.check_entailment("x |-> 5 * y |-> 3 |- x |-> 5")
print(result.valid) # TrueFrame includes a separation logic solver for verifying heap properties:
| Syntax | Meaning |
|---|---|
x |-> v |
x points to value v |
emp |
Empty heap |
P * Q |
P and Q in separate memory |
P -* Q |
Magic wand |
P |- Q |
P entails Q |
Built-in predicates: ls(x,y) (list segment), list(x), tree(x), dll(x,p,y,n)
frame solve "ls(x, y) * ls(y, z) |- ls(x, z)" # List transitivityFrame is validated against industry-standard benchmark suites:
| Benchmark | Domain | Tests | Precision | Recall |
|---|---|---|---|---|
| OWASP Python | Web Security | 500 | 95.3% | 83.5% |
| OWASP Java | Web Security | 500 | 97.2% | 84.8% |
| SecBench.js | Node.js Security | 300 | 82.0% | 81.0% |
| NIST Juliet | C/C++ Memory | 1,000 | 89.9% | 60.5% |
| IssueBlot.NET | C# Security | 171 | 84.7% | 80.3% |
| SL-COMP | Separation Logic | 692 | 79.9% | n/a¹ |
| SMT-LIB QF_S | String Theory | 3,300 | 99.3% | n/a¹ |
python -m benchmarks run --curated # Run all benchmarks¹ SL-COMP and QF_S are logic-solver suites: the percentage is solver accuracy, and recall does not apply.
Beyond the synthetic suites, Frame is scored on the Endor Labs public AI-SAST corpus: 5 production applications. With the LLM layer, Frame reaches 0.67 recall at 0.51 precision, against Semgrep's 0.52 and 0.40. It finds around 65 real vulnerabilities across Java, JS/TS, and C# that both a symbolic engine and Semgrep miss. The benchmark README records how the ground truth was built and the honest caveats.
See benchmarks/README.md for detailed results, methodology, and tool comparisons.