The AI agent memory engine that deliberately stays small.
SQLite + sqlite-vec + Ollama · CJK-native · HTTP / CLI / Python embedded · Apache 2.0
Most AI agent memory tools want to become platforms. memory-hall refuses to.
It's three components (SQLite + sqlite-vec + Ollama), three entry points (HTTP / CLI / Python embedded), and one deliberate philosophy: the engine only stores and retrieves. Your agent stack decides the memory structure. No opinionated enrichment, no required MCP path, no mandatory auth, no replica. Just a fast, durable, CJK-aware store that runs on a single Mac mini.
AI agent memory in 2026 has no canonical implementation — OpenAI's is closed, Anthropic's is preview, and the OSS landscape split two ways:
- Mem0 / Zep / LangMem → SaaS or heavy. You rent memory, pay per volume, and inherit their opinions.
- engram-rs / robotmem / MemOS → local-first but growing features (decay, topic trees, spatial retrieval, OS abstraction). Great if you want structured memory, complex if you don't.
memory-hall sits in the same neighborhood as engram-rs/robotmem/MemOS but chose the opposite direction:
engram-rs, robotmem, MemOS → grow upward (more features, opinions, abstractions)
memory-hall → shrink downward (less features, zero opinions, one engine)
If you want a memory engine that just stores and retrieves — and leaves structure to your agent stack — this is it.
git clone https://github.com/MakiDevelop/memory-hall
cd memory-hall && docker compose up -d
curl -X POST http://localhost:9100/v1/memory/write \
-H "Content-Type: application/json" \
-d '{
"agent_id": "my-agent",
"namespace": "shared",
"type": "episode",
"content": "memory-hall works!"
}'Search:
curl -X POST http://localhost:9100/v1/memory/search \
-H "Content-Type: application/json" \
-d '{"query": "memory-hall", "mode": "hybrid", "limit": 5}'That's it. No auth, no account, no API key. With Docker Compose, your data lives in ./mh-data/memory-hall.sqlite3.
Pure-Chinese query "撞牆" (a common CJK phrase, meaning "hit a wall") against a hall containing Chinese content:
| Tokenizer | BM25 score | Verdict |
|---|---|---|
unicode61 (default in most OSS) |
0 | Miss |
jieba pre-tokenization (memory-hall) |
0.26 | Hit |
Why the gap? unicode61 treats a continuous stretch of Chinese characters as one token, so substring queries miss. memory-hall pre-tokenizes with jieba (with bigram fallback for proper nouns and novel compounds) before indexing in FTS5, both on write and on query.
70% of my own memory content is Chinese. If yours is too, this difference matters.
| memory-hall | mem0 | engram-rs | robotmem | MemOS | |
|---|---|---|---|---|---|
| Deployment | self-host | SaaS | self-host | self-host | self-host + cloud |
| Language | Python | Python | Rust | Python | Python |
| Storage | SQLite + sqlite-vec | Qdrant/pgvector | SQLite + FTS5 | SQLite + FTS5 + vec0 | multi-store |
| CJK first-class | ✅ jieba at storage layer | ❌ (via embedder) | ✅ (BM25 + jieba) | ✅ (jieba) | ❌ (via embedder) |
| MCP server | example wrapper only | — | ❌ | ✅ | — |
| Enrichment / decay | ❌ by choice | ✅ | ✅ 3-layer decay | — | ✅ scheduler |
| Authentication | optional Bearer shim | ✅ | — | — | ✅ |
| Deliberate scope ceiling | ✅ engine only | ❌ growing | — | — | ❌ "OS for memory" |
| License | Apache 2.0 | mixed | Apache 2.0 | — | Apache 2.0 |
Not "mine is best". memory-hall's bet is that engine ≠ platform. If you want an opinionated memory product, use mem0 or MemOS. If you want a minimal engine you can compose your own structure around, use this.
Most READMEs list what a project does. This is the list of what memory-hall deliberately doesn't — each one a design choice, not a TODO:
| Feature | Why not | When it'd change |
|---|---|---|
| MCP as the only path | Adds setup friction; protocol still evolving | Never. MCP can stay a wrapper; HTTP / CLI / embedded remain first-class. |
| Production-grade auth / ACL | Bad early identity picks are hard to undo | MH_API_TOKEN / MH_ADMIN_TOKEN are opt-in shims today; HMAC / ACL belongs in memory-gateway or a future hardened mode. |
| Replica / HA | SQLite's whole value is single-file simplicity; adding consensus violates that | At v2.0, via Postgres adapter swap |
| Enrichment worker (fact extraction, summarization) | Opinionated memory structure is what makes mem0 not fit my use case; I won't repeat that | Never in this repo. Build it on top. |
| Memory decay / topic tree | Same as above — memory shape is your agent stack's job | Never in this repo. |
| Knowledge graph | Same | Never in this repo. |
memory-hall's core promise is three steps:
docker compose up → curl POST /v1/memory/write → you have memory
Every feature above would break that promise. The promise is the product.
See docs/adr/0003-engine-library-vs-deployment-platform.md for the full engine-vs-platform rationale.
| Entry | Audience | Status |
|---|---|---|
HTTP REST :9100/v1/* |
any language, any tool | v0.2 |
CLI mh |
Codex, Gemini CLI, shell scripts | v0.2 |
| Python embedded (in-process) | sandboxed agents, tests, batch imports | v0.2 |
No entry is privileged — they all hit the same backend, so no single-point-of-failure path.
Agents reading this: see
docs/agent-integration.mdfor a decision tree that picks the right surface based on your sandbox, plus the auth + install gotchas that have bitten real Codex / Gemini sessions.
Some agents run in sandboxes that block localhost sockets (Codex CLI, some Gemini setups, restricted containers). For those, skip HTTP entirely:
import asyncio
from memory_hall import Settings, build_runtime
from memory_hall.models import WriteMemoryRequest, SearchMemoryRequest
async def main():
runtime = build_runtime(settings=Settings())
await runtime.start()
try:
await runtime.write_entry(
tenant_id="default",
principal_id="my-agent",
payload=WriteMemoryRequest(
agent_id="my-agent",
namespace="shared",
type="note",
content="hello from inside the process",
),
)
hits = await runtime.search_entries(
tenant_id="default",
payload=SearchMemoryRequest(query="hello", limit=5),
)
print(hits.total)
finally:
await runtime.stop()
asyncio.run(main())No network, no auth, same storage.
Run the engine on one machine, put the embedder on another. My home lab:
[Mac mini M4] memory-hall:0.2.0 ─── Tailscale ──→ [DGX Spark 128GB] Ollama bge-m3
│
└── rsync /5min ──→ [Mac mini #2] memory-hall:0.2.0 (cold standby)
Primary dies → manually docker start memory-hall on standby. Not true HA, but 80% of real HA value for personal / home setups with zero maintenance overhead.
If the Ollama instance that memory-hall embeds against is also serving LLM traffic (e.g. you share one DGX Spark Ollama between an agent chat stack and memory-hall), bge-m3 can get starved by the LLM queue and /v1/health will flap between ok and degraded. Hit that once (hard) in production — see docs/operations/incident-2026-04-20-embed-queue.md.
Workaround since 0.2.1: point memory-hall at a dedicated embed service (any service speaking POST /embed {"texts": [...]} → {"dense_vecs": [...]}) via:
environment:
MH_EMBEDDER_KIND: http
MH_EMBED_BASE_URL: http://<embed-host>:8790Rationale in ADR 0006. The default MH_EMBEDDER_KIND=ollama is unchanged — existing deployments do nothing.
Set MH_API_TOKEN to require Authorization: Bearer <token> on /v1/memory/* endpoints (/v1/health stays public). Set a different MH_ADMIN_TOKEN to require a separate token on /v1/admin/*; when it is unset, admin endpoints fall back to MH_API_TOKEN for backward compatibility. Leave both unset for local dev. Rationale in ADR 0007 and ADR 0009.
Production guard (fail-closed): if you bind to a non-loopback host (e.g. 0.0.0.0 or a public IP) without MH_API_TOKEN set, memory-hall refuses to start — otherwise the write API would be exposed unauthenticated. Bind to localhost, set MH_API_TOKEN, or set MH_ALLOW_INSECURE=1 to explicitly override. Local localhost dev without a token is unaffected.
What v0.2 is
- Write, search (hybrid / semantic / lexical), reindex. HTTP + CLI + Python embedded.
- CJK-aware via jieba at storage layer.
- Durable by default: SQLite WAL, atomic write → index → vector,
content_hashdedup, graceful degradation (HTTP 202 +sync_status=pending+ background reindex worker). - Battle-tested under 50-way concurrent writes (zero data loss) and embedder outages (writes keep succeeding).
- Dogfooded by seven AI agents (Claude / Codex / Gemini / Max / Grok / gemma4 / the human) during development.
What v0.2 is not, yet
- Not a distributed database. One writer, one reader.
- Not production-scale for millions of entries. sqlite-vec is comfortable to ~100k on commodity hardware; beyond that, swap the vector adapter.
- No first-party MCP server in the core package;
examples/claude_mcp/is an integration sketch. - No production-grade identity / ACL. Bearer + admin tokens are opt-in deployment shims, not per-agent auth.
- No multi-tenant validation at scale (schema is multi-tenant from day one per ADR-0002, but cross-tenant isolation at scale isn't stress-tested).
- No enrichment. What you write is what gets stored.
- v0.1 (2026-04-18) — engine shipped. Hit@3 hybrid=60% / lexical=60% / semantic=0% on 177-entry CJK corpus. Durability + concurrency verified. See results-2026-04-18.md.
- v0.2 (2026-04-19) — jieba CJK tokenizer (pure-CJK queries now lexically hit: BM25 0 → 0.26), latency metrics in benchmark, cursor-stream reindex,
embed_batchfor backlog throughput, Docker sqlite-vec upgraded to 0.1.9 (upstream #251 ARM64 ELF32 bug), build-timevec0smoke test. See results-2026-04-19.md. - v0.2.1 (2026-04-20, current) —
HttpEmbedderbackend (ADR 0006) for isolating the embed path from shared-Ollama LLM queues;health_embed_timeout_sseparated from write-path timeout;docker-compose.ymldefault host port corrected to 9100. See CHANGELOG. - v0.3 — harden the MCP wrapper story, decide whether Qdrant/Postgres adapters are worth the complexity, and narrow the auth boundary after more dogfood.
- v1.0 — public release, docs site, example integrations.
- v2.0 — Postgres adapter for replica/HA, more embedder/store adapters.
Architecture decisions: docs/adr/, including why we dropped mem0.
Why not just use mem0? mem0's trade-offs (SaaS-first, English-leaning tokenization, opinionated enrichment, version evolution you can't pin) don't fit if you're running multiple agents locally with CJK-heavy content. See the full 6-reason breakdown.
Why not use engram-rs or robotmem? They're excellent and doing different things — engram-rs adds temporal memory decay and topic trees (Rust), robotmem adds MCP + spatial retrieval (Python, AI robots). memory-hall deliberately skips those layers. If you want opinionated memory structure, pick them. If you want minimal raw engine, pick this. They're not competitors, they're neighbors.
Why jieba specifically?
It's the established CJK segmentation library (≥14 years old, pure Python, no native deps), works well enough for 95% of Chinese content, and fails gracefully (bigram fallback for proper nouns). The jieba decision can be swapped later if a better option appears — but that's not today's problem.
Why SQLite?
Single-file deployment, no server to run, WAL for concurrent reads, 64-bit ARM wheels for sqlite-vec make it actually fast on Apple Silicon, and mv memory-hall.sqlite3 /new/path is your migration plan. The moment you need replica/HA, swap to Postgres — but most personal / home / small-team uses never cross that line.
Can I use this in production? For internal/personal deployments, sure — I dogfood it on my home AI lab and it's shipped multiple v0.x releases without data loss. For a product you're selling, wait for v1.0 or a security review. API stability: v0.x can break; v1.x stable.
How do I contribute? Open an issue (bug reports from real usage are the most valuable — see Max's 30/60/90 day rubric if curious), submit a PR, share what broke for you. The project is built in the open on purpose.
繁體中文
memory-hall 是給多 AI agent(Claude / Codex / Gemini / 本地 LLM / 人類 / 機器人)共用的本地記憶引擎。用 SQLite + sqlite-vec + Ollama 一台 Mac mini 就能跑,CJK 原生(jieba 預切詞),Apache 2.0。
故意保持小——沒有 decay、沒有 topic tree、沒有強制 MCP 路徑、沒有強制 auth、沒有 enrichment worker。agent memory 最容易 bloat 成「另一個平台」,memory-hall 的賭注是:engine 只管儲存,agent stack 主人決定記憶結構。
歡迎一起來玩。開 issue、送 PR、回報你踩到的坑。完整論據見 blog。
简体中文
memory-hall 是给多 AI agent(Claude / Codex / Gemini / 本地 LLM / 人类 / 机器人)共用的本地记忆引擎。用 SQLite + sqlite-vec + Ollama 一台 Mac mini 就能跑,CJK 原生(jieba 预切词),Apache 2.0。
故意保持小——没有 decay、没有 topic tree、没有强制 MCP 路径、没有强制 auth、没有 enrichment worker。memory-hall 的赌注是:engine 只管存储,agent stack 主人决定记忆结构。
日本語
memory-hall は、複数の AI エージェント(Claude / Codex / Gemini / ローカル LLM / 人間 / ボット)が共有できるセルフホスト型メモリエンジンです。SQLite + sqlite-vec + Ollama で Mac mini 一台で動きます。CJK ネイティブ(jieba 分かち書き)、Apache 2.0。
意図的に小さく保つ——decay なし、topic tree なし、必須 MCP パスなし、必須 auth なし、enrichment worker なし。memory-hall の賭けは:エンジンは保存と検索だけ、メモリ構造の決定はエージェントスタックの持ち主に任せる。
English
memory-hall is a self-hostable memory engine for multiple AI agents (Claude, Codex, Gemini, local LLMs, humans, bots). SQLite + sqlite-vec + Ollama runs on a single Mac mini. CJK-native via jieba tokenization. Apache 2.0.
Deliberately small — no decay, no topic tree, no required MCP path, no mandatory auth, no enrichment worker. memory-hall's bet: the engine only stores and retrieves; memory structure is your agent stack's decision.
Deutsch
memory-hall ist eine selbst-hostbare Memory-Engine für mehrere KI-Agenten (Claude, Codex, Gemini, lokale LLMs, Menschen, Bots). SQLite + sqlite-vec + Ollama — läuft auf einem Mac mini. CJK-nativ via jieba-Tokenisierung. Apache 2.0.
Absichtlich klein gehalten — kein Decay, kein Topic Tree, kein verpflichtender MCP-Pfad, keine verpflichtende Auth, kein Enrichment-Worker. Die Engine speichert und ruft ab; die Memory-Struktur entscheidet dein Agent-Stack.
Français
memory-hall est un moteur mémoire auto-hébergeable pour plusieurs agents IA (Claude, Codex, Gemini, LLM locaux, humains, bots). SQLite + sqlite-vec + Ollama tournent sur un seul Mac mini. CJK natif via tokenisation jieba. Apache 2.0.
Volontairement petit — pas de decay, pas de topic tree, pas de chemin MCP obligatoire, pas d'auth obligatoire, pas de worker d'enrichissement. Le moteur stocke et récupère ; la structure de la mémoire, c'est à votre agent stack de la décider.
Italiano
memory-hall è un motore di memoria self-hosted per più agenti AI (Claude, Codex, Gemini, LLM locali, umani, bot). SQLite + sqlite-vec + Ollama girano su un singolo Mac mini. CJK nativo tramite tokenizzazione jieba. Apache 2.0.
Volutamente piccolo — niente decay, niente topic tree, nessun percorso MCP obbligatorio, nessuna auth obbligatoria, niente enrichment worker. Il motore salva e recupera; la struttura della memoria la decide il tuo agent stack.
한국어
memory-hall 은 여러 AI 에이전트(Claude / Codex / Gemini / 로컬 LLM / 사람 / 봇)가 함께 쓰는 셀프 호스트형 메모리 엔진입니다. SQLite + sqlite-vec + Ollama 로 Mac mini 한 대에서 돌아갑니다. CJK 네이티브(jieba 토큰화), Apache 2.0.
의도적으로 작게 유지 — decay 없음, topic tree 없음, 필수 MCP 경로 없음, 필수 auth 없음, enrichment worker 없음. memory-hall의 베팅: 엔진은 저장과 검색만, 메모리 구조 결정은 당신의 에이전트 스택이.