Context Engine is an open-source context layer for startups and AI systems.
It is a self-hostable knowledge platform for startups. It solves a specific problem: company knowledge is scattered across Slack messages, Notion docs, Zoom meeting transcripts, and GitHub issues. When someone asks "what's the current pricing?" or "who decided to delay the launch?", the answer might be buried in a Slack thread from two weeks ago — and it might contradict what's written in Notion. Context Engine ingests all of that raw data, extracts structured facts from it using LLMs, tracks where each fact came from (provenance), flags conflicts, and serves source-backed answers through a query API and operator dashboard.
- startups that want a source-backed internal context layer
- founders and operators who need trustworthy answers, not vague retrieval
- engineering and product teams that want decisions, blockers, and changes made explicit
- agent builders who want auditable context instead of generic RAG
Current connector surface includes:
- Slack
- Notion
- Zoom transcripts
- GitHub issues and pull requests
The backend stores:
source_documents: raw ingested evidencecomponents: extracted factsrelationships: links between factscomponent_sources: provenance from fact back to sourcereview_items: conflicts, low-confidence facts, superseded factssync_jobs: background sync and reprocess tracking
The app currently includes:
- Founder Brief: summarize what changed, what is risky, and what needs attention
- Decision Register: view current and historical decisions with rationale and blockers
- What Changed: timeline across decision changes, reviews, ingests, and failures
- Launch Guard: check outbound copy against current truth, review state, and evidence
- Meetings: inspect transcript-backed decisions and blockers
- Engineering: inspect GitHub-backed engineering context
- Accuracy: review eval results, domains, cases, and benchmark queries
- Review Queue: resolve conflicts and low-confidence facts
- Connectors / Sources / Models / Query / Graph: operate the underlying system
Context Engine is intentionally opinionated:
- source-backed over similarity-only
- reviewable over opaque
- current truth by default
- historical truth when requested
- structured facts over free-form memory
- self-hostable by default
- FastAPI
- SQLAlchemy async ORM
- PostgreSQL +
pgvector - Redis
- Celery
- Alembic migrations
- React
- Vite
- React Query
- React Router
The current architecture supports:
- schema-constrained extraction with rule fallback
- structured fact storage in Postgres
- provenance-aware query responses
- temporal fact visibility
- hybrid lexical + semantic scoring groundwork
- eval summaries and case-level regressions
The OSS v1 release candidate has two primary rails:
- demo data for immediate time-to-value
- real local text import for your own notes, docs, and exports
Prerequisites:
- Docker Engine with Compose v2
python3curlnpmonly if you plan to run the full release gate withctxe verify- PostgreSQL client tools (
dropdb,createdb,psql) only if you plan to run the contract-test phase inctxe verify
Install the CLI once in a local virtualenv:
git clone <this-repo> context-engine
cd context-engine
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"Leave LITELLM_API_KEY, EXTRACTION_MODEL, and EMBEDDING_MODEL blank for a fully offline OSS run using the local deterministic embedder and rule-based extraction fallback.
Boot the stack, apply migrations, and seed the canonical demo workspace:
ctxe demo
ctxe query --workspace "Acme Accuracy Demo" "What is the Starter Plan?"This rail uses the stable public contracts:
POST /api/seed-demoPOST /api/query
Boot the stack, import local files, then query the imported workspace:
ctxe up
ctxe ingest ./notes
ctxe query --workspace "Local Workspace" "What changed?"Import semantics:
ctxe ingest <path>usesPOST /api/imports- if no workspace exists, the CLI creates
Local Workspace - if exactly one workspace exists, the CLI uses it
- if multiple workspaces exist, pass
--workspace NAME_OR_UUID
Run the OSS v1 release gate:
ctxe verifyctxe verify is the primary maintainer command before release. It boots the stack, runs the backend smoke flow, executes the contract tests against a dedicated disposable test database, and runs the frontend test/build checks. Use ctxe verify --skip-frontend for a backend-only pass.
ctxe verify uses the demo rail internally. It validates boot, readiness, and a canonical POST /api/seed-demo before handing off to the broader smoke and test matrix. The CLI also creates .env from .env.example and generates ENCRYPTION_KEY automatically when they are missing, so the CLI boot path and shell bootstrap path behave the same on a fresh checkout.
In human-readable mode, ctxe verify prints the selected phases first, shows any skipped phases when you use --phase or --skip-frontend, then prints one line per completed phase. On failure it prints the failing phase, the phases that already passed, and the exact next command to run.
Maintainer flow:
- bootstrap a local stack with
ctxe demoorbash scripts/bootstrap.sh - confirm founder workflows with
bash scripts/smoke.sh - run the full release gate with
ctxe verify --json - release only when local
ctxe verifyand the PRRelease Gateworkflow are green
CI path:
- every PR and push to
mainrunsRelease Gate - CI runs the same gate via
ctxe verify --json --test-database-url postgresql+asyncpg://postgres:postgres@localhost:5432/context_engine_verify - the workflow summary reports release status, selected phases, completed phases, and the failing
phaseplusnext_stepwhen the gate stops early - the release story excludes compatibility-only routes such as
GET /api/query,POST /api/source-documents/upload, andPOST /api/imports/trigger
If you prefer shell-only flows, the lower-level wrappers are still available:
bash scripts/bootstrap.sh
bash scripts/smoke.shThey exercise the same public HTTP contracts (/api/seed-demo, /api/imports, /api/query, /api/founder-brief, /api/decisions, /api/source-documents) but are wrapper/reference surfaces, not the primary OSS operator interface.
For the full self-hosting walkthrough (TLS, port security, backups, troubleshooting), see docs/self-hosting.md. For exact release-candidate steps, expected green checks, and rollback notes, see docs/release.md.
Once the API is up:
- API:
http://localhost:8000 - Health:
http://localhost:8000/health - Readiness:
http://localhost:8000/health/ready - OpenAPI docs:
http://localhost:8000/docs
To run the operator/admin UI against this backend, see Run the Frontend below.
ctxe is the canonical OSS operator entrypoint.
Core commands:
ctxe up
ctxe demo
ctxe ingest ./notes
ctxe query "What changed?"
ctxe verifyContract and semantics:
ctxe upbuilds the Docker services, applies migrations, and waits for/health/ready. It does not create or seed a workspace.ctxe demousesPOST /api/seed-demo, the same HTTP contract used by the frontend demo flow and the shell bootstrap/smoke scripts.ctxe demo --workspace NAME_OR_UUIDseeds an existing workspace by passingworkspace_idtoPOST /api/seed-demo.ctxe ingest <path>usesPOST /api/imports. The API contract always requiresworkspace_id; the CLI resolves it from--workspace, a single existing workspace, or createsLocal Workspacewhen none exists. It does not silently choose among multiple workspaces.ctxe query "..."usesPOST /api/queryand requires either--workspaceor exactly one existing workspace.ctxe verifyruns the release gate: boot, backend smoke, contract tests, and frontend test/build checks.ctxe verify --phase ...reruns only the selected slice of the release gate in canonical phase order.ctxe verify --test-database-url ...points the contract-tests phase at a disposable database; by default it usescontext_engine_verifyon local Postgres so the test reset does not collide with the live app database.- Add
--jsontoctxe demo,ctxe ingest,ctxe query, orctxe verifyfor machine-readable success and error payloads.ctxe verify --jsonincludes the failingphase, actionablenext_step, andcompleted_stepswhen the gate stops early.
Workspace selector rules are consistent across ctxe demo, ctxe ingest, and ctxe query:
--workspace UUIDtargets that exact workspace or fails clearly if it does not exist--workspace NAMEmatches case-insensitively on exact name- ambiguous names fail; they are never auto-resolved
- no selector means:
ctxe demoseeds the canonical demo workspacectxe ingestcreatesLocal Workspacewhen none exists, uses the only workspace when exactly one exists, and fails when multiple existctxe queryuses the only workspace when exactly one exists and fails otherwise - frontend founder workflows use the selected workspace from the workspace switcher, auto-resolve only when exactly one workspace exists, and otherwise require an explicit selection
ctxe verify is the primary "is this release candidate credible?" command:
ctxe verifyIt proves:
| Step | What it checks |
|---|---|
| BOOT | Docker services are up and migrations apply |
| READINESS | GET /health returns ok and GET /health/ready returns ready |
| SEED | POST /api/seed-demo returns the canonical demo workspace before smoke runs |
| SMOKE | scripts/smoke.sh verifies seed, query, graph, models, brief, decisions, sources, and imports against the live backend (10 checks) |
| CONTRACT TESTS | CLI + founder + trust/review API regression tests stay green |
| FRONTEND TESTS | npm test passes |
| FRONTEND BUILD | npm run build passes |
For a backend-only check, run ctxe verify --skip-frontend.
If a phase fails, ctxe verify reports the exact failing phase and the next command to run for diagnosis.
To rerun only part of the gate, pass --phase one or more times:
ctxe verify --phase boot --phase readiness --phase seed --phase smoke --skip-frontend
ctxe verify --phase contract-tests
ctxe verify --phase frontend-tests --phase frontend-buildThe GitHub Actions workflow .github/workflows/release-gate.yml runs the same core checks as ctxe verify on pull requests. It uploads the raw release-gate.json report as an artifact and renders the same result in the Actions step summary with the overall status, selected phases, completed phases, and the failing phase plus next_step when the gate fails.
bash scripts/smoke.sh remains the backend-only smoke path. It is useful for targeted debugging or post-deploy checks, but ctxe verify is the release gate maintainers should treat as canonical.
Stable now:
ctxe up,ctxe demo,ctxe ingest,ctxe query,ctxe verifyGET /api/workspaces,POST /api/workspaces,GET /api/workspaces/{id}POST /api/seed-demoPOST /api/importsGET /api/founder-briefPOST /api/queryGET /api/decisionsGET /api/source-documents
Compatibility-only:
GET /api/queryfor older callers; founder workflows should usePOST /api/queryPOST /api/source-documents/uploadPOST /api/imports/trigger
Not production-grade yet:
- internet-facing auth and access control
- enterprise auth/SSO
- broad connector breadth beyond the current OSS workflow set
These are the stable routes that founder-facing workflows should rely on:
| Workflow | Stable API contract | Frontend surface | CLI / smoke surface | Notes |
|---|---|---|---|---|
| Workspace bootstrap | GET /api/workspaces, POST /api/workspaces |
useWorkspaces, useCreateWorkspace |
ctxe ingest, ctxe query resolve workspaces before acting |
Workspace selection is always explicit at the API layer. |
| Demo seed | POST /api/seed-demo |
useSeedDemoData |
ctxe demo, scripts/bootstrap.sh, scripts/smoke.sh |
Omit workspace_id to seed the canonical Acme Accuracy Demo; include workspace_id to seed a specific existing workspace. |
| Local import | POST /api/imports |
useUploadSourceFile |
ctxe ingest <path> |
This is the stable import path. The API contract requires workspace_id plus normalized documents[]. |
| Founder Brief | GET /api/founder-brief |
useFounderBrief |
direct API / browser | Reads structured facts + provenance for founder summary. |
| Query | POST /api/query |
useContextQuery |
ctxe query "...", scripts/smoke.sh |
Query answers are source-backed and workspace-scoped. |
| Decisions | GET /api/decisions |
useDecisionRegister |
direct API / browser | Decision history drilldown remains under /api/decisions/{component_id}/history. |
| Sources | GET /api/source-documents |
useSourceDocuments |
direct API / browser | Source visibility should reflect the same imported or seeded workspace. |
Compatibility-only routes may still exist for older admin flows, but founder workflows should not depend on POST /api/source-documents/upload or /api/imports/trigger as their primary contract.
Run this checklist for every release candidate:
- Run the canonical release gate:
ctxe verify - Confirm exactly one founder data rail for the release notes and docs:
Demo rail:
ctxe demoReal import rail:ctxe ingest <path> - Confirm workspace semantics are explicit and non-ambiguous:
ctxe demo --workspace NAME_OR_UUIDseeds the selected existing workspace.ctxe ingest --workspace NAME_OR_UUIDimports into the selected workspace.ctxe query --workspace NAME_OR_UUID "..."queries that same workspace.ctxe ingestandctxe querydo not silently choose among multiple workspaces. - Spot-check the founder routes against the real backend:
GET /api/workspacesPOST /api/seed-demoPOST /api/importsGET /api/founder-brief?workspace_id=...POST /api/queryGET /api/decisions?workspace_id=...GET /api/source-documents?workspace_id=... - If you need to run the gate manually instead of
ctxe verify, run:bash scripts/smoke.shTEST_DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/context_engine_verify python3 -m pytest tests/test_cli/test_main.py tests/test_cli/test_http.py tests/test_api/test_imports.py tests/test_api/test_admin.py::TestSeedDemoAPI tests/test_api/test_connectors_upload.py tests/test_api/test_trust.py tests/test_api/test_truth_regression.py tests/test_api/test_query.py tests/test_api/test_briefing.py -qcd frontend && npm testcd frontend && npm run build
If Docker or Postgres access is sandbox-restricted, treat DB-backed backend suites and the live smoke script as environment-limited rather than product regressions, but keep the contract tests, frontend tests, and build green.
Context Engine runs comfortably on a small VPS. The resource envelope below assumes the default offline OSS path (local embedder + rule extractor, no provider LLM calls):
| Tier | vCPU | RAM | Disk | Suitable for |
|---|---|---|---|---|
| Minimum | 2 | 2 GB | 10 GB | Bootstrap, smoke, small demo workspace |
| Recommended | 2 | 4 GB | 20 GB | Everyday self-hosted use, a few thousand source documents |
| Comfortable | 4 | 8 GB | 40 GB | Many connectors + provider-backed extraction + denser embedding use |
Notes:
- Postgres with
pgvectoris the main memory consumer — embeddings and ANN indexes benefit from page cache. - Switching to provider-backed extraction/embeddings (
LITELLM_API_KEY+EXTRACTION_MODEL+EMBEDDING_MODEL) does not meaningfully increase host RAM — the model runs remotely. - The Celery worker is light by default (
--concurrency=2). Scale it by raising the concurrency or running additional worker containers. - Disk usage grows with
source_documents+ embeddings; budget ~1 GB per 50k average-sized documents, then add headroom for pgvector indexes.
All stateful data lives in two named Docker volumes declared in docker-compose.yml:
postgres_data— Postgres database (source documents, components, relationships, review items, evals, everything)redis_data— Redis (Celery queue + transient caches)
docker compose down leaves these volumes intact; only docker compose down -v destroys them. Back them up before every upgrade.
The supported backup/restore path is two scripts, not hand-rolled pg_dump:
# Validated pg_dump with rotation. Writes to ./backups/ by default.
bash scripts/backup.sh
# Safe restore: validates the dump, snapshots the current DB first,
# stops api/worker, runs pg_restore, restarts, and probes the API
# to prove the restored data is actually queryable.
bash scripts/restore.sh backups/context_engine-YYYYMMDDTHHMMSSZ.dump --yes --safety-backupbackup.sh is cron-friendly (--quiet --retention 30 --output /backups), and scripts/upgrade.sh runs it automatically before every upgrade.
scripts/diagnose.shis not a backup. It's a read-only triage snapshot (logs, health, redacted config) — it contains no application data and cannot be restored from.
For automated nightly cron, off-host copies, fresh-stack restore, the raw pg_dump/pg_restore fallback, and the full operations playbook (upgrade, rollback, queue backlog, worker health, schema drift, seed/import recovery, container failures, disk pressure), see docs/runbook.md.
For volume-level backups on a cheap VPS, snapshot the entire docker volume directory (typically under /var/lib/docker/volumes/) while the stack is stopped, or rely on your provider's block-storage snapshots.
Context Engine is designed to run on a single small VPS. A 2 vCPU / 4 GB RAM instance from any mainstream provider is enough for a real self-hosted deployment; the "minimum" tier above works for demos.
A reasonable shape for a cheap VPS deploy:
- Provision a Linux host (Ubuntu 24.04 LTS or Debian 12 are the least-friction picks) with at least 4 GB RAM and 20 GB disk.
- Install Docker Engine + Compose v2 (the official Docker convenience script or your distro's packages are both fine).
git clonethis repo and runbash scripts/bootstrap.sh.- Put a TLS-terminating reverse proxy (Caddy, Traefik, or nginx) in front of
http://localhost:8000. Block direct public access to the compose-published port if possible. - Set
HOST_POSTGRES_PORTandHOST_REDIS_PORTin.envto127.0.0.1:5432/127.0.0.1:6379bindings — or remove theports:entries entirely — so Postgres and Redis are never exposed to the public internet. - Enable automated snapshots on the host (provider-level) and a nightly
pg_dumpto object storage. - Run
bash scripts/smoke.shafter every deploy.
Auth note: Context Engine does not yet ship a production-grade auth layer. For a real internet-facing deploy, restrict access at the reverse proxy (basic auth, an allowlist, Tailscale, or a Cloudflare Access tunnel) until proper auth lands.
cd frontend
npm install
npm run devThe Vite dev server will proxy API requests to the backend at http://localhost:8000.
If you already have PostgreSQL and Redis available locally:
pip install -e ".[dev]"
alembic upgrade head
uvicorn app.main:app --reloadIn a second terminal, seed the demo workspace through the public HTTP contract:
curl -X POST http://localhost:8000/api/seed-demo \
-H 'Content-Type: application/json' \
-d '{}'Start the worker separately:
celery -A app.tasks.celery_app worker --loglevel=info --concurrency=2Run the frontend:
cd frontend
npm install
npm run devRun the eval regression harness against a workspace:
context-engine-eval-regression --workspace-id REPLACE_WITH_WORKSPACE_ID --jsonThe script wrapper also works:
python scripts/run_eval_regression.py --workspace-id REPLACE_WITH_WORKSPACE_ID --jsonThe current v1 accuracy gate is built around:
25gold-set cases5domains:pricing,blocker,roadmap,decision,meeting>= 0.80pass rate>= 0.80retrieval quality>= 0.80extracted fact correctness>= 0.75final answer correctness<= 0.25confidence calibration error
If you want real extraction / embedding models instead of local fallbacks:
LITELLM_API_KEY=...
EXTRACTION_MODEL=openai/gpt-4.1-mini
EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_DIMENSIONS=1024For Zoom OAuth + webhooks:
ZOOM_CLIENT_ID=...
ZOOM_CLIENT_SECRET=...
ZOOM_REDIRECT_URI=https://your-api.example.com/api/connectors/zoom/callback
ZOOM_WEBHOOK_SECRET=...The Zoom connector is transcript-first. Manual-token Zoom remains polling-based; OAuth-installed Zoom can support webhook-driven sync.
GitHub does not require app-level env vars for the first pass. Connect via the backend API with a manual token and repository list.
Backend founder-workflow smoke test (boot + health + seed + query + graph + models + brief + decisions + sources + imports):
bash scripts/smoke.shBackend tests:
python -m pytest tests/ -x --tb=shortFrontend tests:
cd frontend
npm testFrontend production build:
cd frontend
npm run buildStable founder-workflow routes:
/api/workspaces/api/seed-demo/api/imports/api/founder-brief/api/query/api/decisions/api/source-documents
Broader operator / system API groups:
/api/connectors/api/review-items/api/timeline/api/launch-guard/api/evals
The router lives in app/api/router.py.
app/
api/ FastAPI route groups
connectors/ Connector implementations and strategy metadata
models/ SQLAlchemy models
processing/ Extraction, embeddings, reranking
services/ Core business logic
tasks/ Celery tasks
alembic/ Database migrations
frontend/ Operator/admin UI
scripts/ Preflight, seeding, smoke, eval entrypoints
tests/ Backend tests
Context Engine is in late OSS v1 / hardening territory:
- core product workflows exist
- the operator UI is largely built
- accuracy, provenance, review, and timeline surfaces are present
- the remaining work is mostly runtime verification, hardening, and connector/workflow refinement
Context Engine is not trying to be generic enterprise search.
It is a self-hostable, source-backed context system for fast-moving teams that need:
- current truth
- historical truth
- explicit review state
- source paths for every answer
- startup-relevant workflows on top of raw company context