feat (graphdb) Neo4j backend — E2E green ✅#47
Open
hourdays wants to merge 18 commits into
Open
Conversation
Adds Neo4j (Bolt / Cypher) as a selectable graph DB engine alongside
Lakebase Postgres. PR 1 ships the integration shape + flat-triple CRUD.
PR 2 will add the 16 Cypher named-query implementations + a
SWRLFlatCypherTranslator for reasoning.
Changes:
- src/back/core/graphdb/neo4j/ — new package, copied from the
_starter_kit template and filled in per docs/graphdb-integration.md.
- Neo4jStore extends GraphDBBackend; flat triples persisted as
(:Triple:<label> {subject, predicate, object}) nodes with a SPO
uniqueness constraint per logical store.
- CRUD: create_table, drop_table, insert_triples (batched via UNWIND
+ MERGE), delete_triples, query_triples, count_triples,
table_exists, get_status.
- Capability flags: supports_cypher=True, supports_graph_model=False
(flat triples in v1), query_dialect="cypher".
- engine_config keys: uri, database, auth_method (basic |
databricks_secret), credentials, encrypted.
- Named-query overrides stubbed with safe defaults + TODO(PR2)
markers — the app degrades gracefully on Neo4j until PR 2 lands.
- execute_query raises NotImplementedError on purpose: no raw
Cypher entry point; all writes go through the build pipeline
after ontology validation (C2 safeguard).
- sync_to_remote / sync_from_remote / local_path are no-ops —
Neo4j Aura is remote-only.
- src/back/core/graphdb/GraphDBFactory.py — registers _create_neo4j
dispatch, NEO4J_AVAILABLE guarded import.
- src/back/objects/session/GlobalConfigService.py — adds "neo4j" to
ALLOWED_GRAPH_ENGINES so the Settings dropdown can persist it.
Not yet in this commit (next commits on this branch):
- Settings UI: left-menu "Neo4j" entry under TRIPLE STORE + dropdown
option in #graphEngineSelect + Neo4j-specific config page.
- pyproject.toml optional dependency "neo4j>=5.0".
- tests/units/graphdb/test_neo4j_store.py.
- changelogs/v0.5.0/hourdays_2026-06-09.log.
Adds the Neo4j surfaces in Settings so users can select and configure
the engine. JS wiring for load/save comes in the next commit.
- src/front/config/menu_config.json: new "Neo4j" item under TRIPLE
STORE group (icon bi-bezier2), mirroring the Lakebase entry.
- src/front/templates/settings.html:
- Dropdown: <option value="neo4j">Neo4j (Bolt)</option> in
#graphEngineSelect (Triple store > Global page).
- New #neo4j-section sidebar-section with the config form: URI
(Bolt), database, auth_method (basic | databricks_secret),
credentials, encrypted toggle. Test-connection button slot
(handler comes in the next commit).
- Architecture note explains the C2 safeguard (no raw Cypher).
Replaces the safe-default stubs on Neo4jStore with native Cypher implementations of the 16 named-query methods defined on TripleStoreBackend. The app's Knowledge Graph view, Inference page, Graph Chat, GraphQL endpoint, and entity-detail pages now work when Neo4j is the active engine (subject to SWRLFlatCypherTranslator, which lands in the next commit). Implementations cover: - Statistics — get_aggregate_stats, get_type_distribution, get_predicate_distribution. - Entity lookup — find_subjects_by_type (with optional value filter via toLower CONTAINS), resolve_subject_by_id, get_entity_metadata, get_triples_for_subjects, get_predicates_for_type. - Pagination — paginated_triples + paginated_count. Note: SQL WHERE-fragment conditions are not translated; callers that need filtered pagination should switch to find_subjects_by_type or find_seed_subjects. The unfiltered case is logged. - Traversal — bfs_traversal (iterative expansion for depth > 1), find_seed_subjects (entity_type × value with field=label|id|any and match_type=contains|exact|starts|ends), find_subjects_by_patterns (LIKE patterns → Cypher regex via =~), expand_entity_neighbors (1-hop outgoing+incoming, filtered to typed entities). - Reasoning — transitive_closure (chained MATCH up to max_depth=20), symmetric_expand, shortest_path (BFS-based iterative reconstruction given the flat-triple model — a typed-relationship model would let us use native shortestPath). - Cohorts — delete_cohort_triples (DETACH DELETE with safety limit). All implementations use parameterised Cypher to avoid injection. Graph traversal joins Triple nodes by property equality because the flat-triple model has no typed relationships between entities — a typed graph model is a future PR. Remaining TODO(PR2) markers (3): - Databricks-secret auth resolution path (file line 166) - SWRLFlatCypherTranslator wiring in get_query_translator (line 218) — next commit - The stale docstring claim about "TODO(PR2) markers throughout" (line 11) — will sweep in the polish pass.
Adds the Cypher counterpart of SWRLSQLTranslator so the reasoning
architecture is in place when Neo4j is the active engine. Methods are
scaffolded (return None + warn) rather than fully translating SWRL to
Cypher — that translation is its own substantial piece of work (the
SQL counterpart is ~730 lines of careful logic for builtins, negation,
variable bindings, etc.) and deserves a dedicated PR with its own test
suite. Returning None makes the reasoning engine treat each rule as
"no work to do", so the UI surfaces zero violations / zero inferences
cleanly instead of crashing.
- src/back/core/reasoning/SWRLFlatCypherTranslator.py: NEW. Same
public interface as SWRLSQLTranslator (build_violation_sql,
build_antecedent_count_sql, build_materialization_sql,
build_inference_sql) plus matching *_cypher aliases. The class
docstring documents the scaffolded status and the path to full
implementation.
- src/back/core/graphdb/neo4j/Neo4jStore.py:
- get_query_translator() returns SWRLFlatCypherTranslator (was a
super() pass-through to the SQL default).
- Module docstring refreshed: no longer mentions "TODO(PR2) markers
throughout" since the named-query stubs have been replaced with
native Cypher.
Known limitation (mirrored in PR description + changelog):
Reasoning on Neo4j reports 0 violations / 0 inferences until the
dedicated SWRLFlatCypherTranslator translation PR lands. All other
Neo4j surfaces (CRUD, KG view, Inference UI navigation, Graph Chat,
GraphQL) work normally.
- pyproject.toml: add optional-dependency `neo4j = ["neo4j>=5.0"]`. Installed via `uv sync --extra neo4j` or `pip install .[neo4j]`. - tests/units/graphdb/test_neo4j_store.py: NEW. Driver-mocked unit tests covering capability flags, construction validation (missing URI, bad auth_method, defaults), schema sanitisation, CRUD Cypher emission shapes, named-query dispatch, factory routing, and reasoning translator wiring. Skips cleanly when neo4j is not installed. - changelogs/v0.5.0/hourdays_2026-06-09.log: entry per .cursorrules format (user prefix [hourdays] + today's date). The changelog also documents the known limitations on this branch (reasoning no-op, settings.js wiring, Build page labels, paginated SQL conditions, databricks_secret auth resolution).
Mirrors the Lakebase pattern: when the active engine is "neo4j",
saveGraphDbSettings dispatches to mergeNeo4jPanelIntoConfigTextarea(),
which reads the Neo4j form fields from #neo4j-section and serialises
them into the shared #graphEngineConfig textarea. The existing save
path then POSTs the JSON to /settings/graph-engine-config.
- src/front/static/config/js/settings.js:
- saveGraphDbSettings: add neo4j branch alongside lakebase.
- mergeNeo4jPanelIntoConfigTextarea(): NEW — reads uri, database,
auth_method, encrypted, and either (username, password) or
(secret_scope, secret_key) depending on auth_method; writes JSON
to #graphEngineConfig.
- applyNeo4jAuthMethodVisibility(): NEW — toggles .neo4j-auth-basic
vs .neo4j-auth-databricks-secret field groups based on the auth
method dropdown. Runs on load + on each change.
- Live field listeners (input/change) on the 8 form fields keep the
textarea in sync as the user edits — same UX as Lakebase.
- Test-connection button: surface a friendly "deferred to follow-up"
message for now so the button isn't silently broken.
End-to-end save now works: select Neo4j from the dropdown, fill the
Neo4j section form, click Save — engine_config persists via the same
endpoint Lakebase uses.
5-step procedure to validate the Neo4j engine end-to-end against a live Aura instance — switch engine, configure connection, run build, verify triples landed in Neo4j Browser, confirm Inference no-ops gracefully. Captures the screenshot artefacts expected in briefs/2026-06-09/1/ and the rollback path (just flip the dropdown back to Lakebase). Run this once before marking PR #47 ready-for-review.
Bug caught by the live E2E smoke test against the Ryan-provisioned
Aura instance: Neo4j 5+ CREATE CONSTRAINT only accepts single-label
patterns (FOR (n:Label)), so the original :Triple:<store_name> compound
label raised CypherSyntaxError on create_table.
Fix: switch every per-store triple node from `:Triple:<store>` to
`:`<store>`` (single backtick-quoted label per logical store). The
SPO uniqueness constraint, MERGE writes, MATCH reads, and the Show-
constraints existence check all work against this simpler schema.
Verified end-to-end against neo4j+s://b4810af7.databases.neo4j.io:
✓ create_table → constraint installed
✓ table_exists → True
✓ insert_triples(n=11) → 11 nodes written via UNWIND/MERGE
✓ count_triples → 11
✓ query_triples → returns all 11 with subject/predicate/object
✓ find_subjects_by_type → returns both customers
✓ get_aggregate_stats → total=11, distinct_subjects=5,
distinct_predicates=4,
type_assertion_count=5,
label_count=3
✓ get_entity_metadata → {type, label} for each customer
✓ expand_entity_neighbors → typed neighbors of C1
Also adds the runnable smoke test as a committed artifact so future
contributors can replay the verification:
tests/integration/neo4j_e2e_smoke.py
Reads credentials from
~/Documents/CODE/ontobricks/briefs/2026-05M-12/5/neo4j_connection_details.txt
(gitignored).
Docstring comments updated to mention the single-label scheme. No
other callers reference the old :Triple supertype.
The Build / Digital Twin Information page's "Graph DB" card was hardcoded to show "Graph DB (Lakebase)" regardless of the active engine. Now reads from dt.graph_engine and maps to the matching label: lakebase → "Graph DB (Lakebase)" neo4j → "Graph DB (Neo4j)" other → "Graph DB (<engine>)" / "Graph DB Digital Twin" fallback Updated: - src/front/static/domain/js/domain-validation.js (line 456) — domain validation card. - src/front/static/query/js/query-sync.js (line 156) — Digital Twin sync page. The template default text "Graph DB (Lakebase)" stays for the pre-hydration frame; JS overrides it on first render based on the configured engine.
app.yaml.template's uv run command only included `--extra lakebase`,
so the deployed app didn't install the optional `neo4j` driver group.
At runtime that left `NEO4J_AVAILABLE = False` and any graph-facing
route (Knowledge Graph view, Inference, GraphQL, Graph Chat) raised
``InfrastructureError("Graph backend is not configured")`` even when
the admin had selected Neo4j and saved the engine config.
Add `--extra neo4j` alongside `--extra lakebase` so both engines
are available in the deployed app regardless of which one is active
at the time of deploy. Mirrors the Lakebase pattern (admin can flip
without redeploying). ~5MB extra deploy footprint when Neo4j is
unused.
`dt.graph_engine` is only set after a domain is built. Pre-Build it is empty, and the existing `|| 'lakebase'` fallback mislabels the card on Neo4j workspaces. Async-fetch `/settings/graph-engine` and re-apply the title + Lakebase-details visibility once the global engine is known.
`graph_engine = _raw if _raw == "lakebase" else "lakebase"` is a tautology that throws away any non-Lakebase engine before it reaches the template, so __TRIPLESTORE_CONFIG.graph_engine was always 'lakebase' regardless of the global setting. Pass _raw through directly; ALLOWED_GRAPH_ENGINES gate validates upstream.
When the server-rendered __TRIPLESTORE_CONFIG.graph_engine is stale (e.g. defaulted to 'lakebase' before the global setting was switched to Neo4j), the JS now always re-fetches the authoritative value from /settings/graph-engine and re-applies the title pre-Build.
dt.graph_engine can be stale even after a build — it reflects the engine recorded on the domain at build-time, not necessarily the active global engine. Drop the "only when empty" guard and reconcile unconditionally against /settings/graph-engine on every render.
Previous patch hid the entire dtLakebaseDetails block when engine = neo4j, which removed both the (Lakebase-specific) Sync card AND the Graph DB card from the Build page. New _renderEngineUi() helper keeps the container visible and toggles only the Lakebase-specific children: Sync card, build note, Lakebase icon. On Neo4j the Graph DB card shows "Graph DB (Neo4j)" with the graph name + "Bolt" label.
Restore visual symmetry with Lakebase by adding a middle "Bolt" card between Triple Store and Graph DB (Neo4j). Lakebase shows the Lakeflow UC-synced table (persistent); Neo4j shows the Cypher UNWIND/MERGE batch (transient, build-time). Same 3-card pipeline, two different bridge mechanisms.
Mirror the Build page change in the Cockpit's Digital Twin section: Triple Store → Bolt (Neo4j) / Sync (Lakebase) → Graph DB. Cockpit now visibly shows the active engine, where before it was entirely engine-agnostic and you had to navigate to Settings to check.
Adds docs/v0.6-neo4j-demo/ with the proof artefacts for PR #47: - OntoBricks-PR47-Neo4j.pdf (21 slides, 4.9 MB) - deck.html (single-file HTML deck, same content) - screenshots/ (13 PNGs referenced by the deck) - README.md with the demo numbers and reproduction steps Captured live on fevm-mjolnir on 2026-06-12 with the PFAS research paper ontology: 32 classes, 303 triples written to Neo4j over Bolt, 99 OWL 2 RL inferred, 92.3% SHACL Consistency pass in Graph mode.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Adds Neo4j (Bolt / Cypher) as a fully-functional graph DB engine alongside Lakebase. Opt-in via
Settings → Triple Store → Global → Neo4j (Bolt). Lakebase remains the default; existing deployments are unaffected.End-to-end demo on
fevm-mjolnirusing a real PFAS research-paper ontology:fevm-mjolnironly📚 Deck + screenshots (committed in this PR)
Full deck (21 slides) and screenshots live under
docs/v0.6-neo4j-demo/:OntoBricks-PR47-Neo4j.pdf— 4.9 MB, 21 slidesdeck.html— same content, single-file HTMLscreenshots/— 13 source PNGsKey proof screenshots
Settings → Triple Store → Global · engine switched to Neo4j
Build success · 3-card arch: Triple Store → Bolt (UNWIND·MERGE) → Graph DB (Neo4j) · 303 triples
Cockpit · same 3-card arch · Digital Twin Active
Neo4j Browser · 303 nodes under
:WaterTreatment_V1labelInference · T-Box OWL 2 RL: 99 inferred in 0.102 s
GraphQL Playground · real query against the Neo4j-backed graph
SHACL Data Quality · Graph mode against Neo4j · 92.3 % Consistency pass · 1 rule with 12 violations
What this PR ships
When a user picks Neo4j in Settings → Triple store → Global and configures URI/database/auth in Settings → Triple store → Neo4j, the entire OntoBricks stack works against the Neo4j backend:
BoltwithUNWIND+MERGEover a:store-labelled nodesSWRLFlatCypherTranslator— currently scaffolded (returnsNone+ warns), full translation in a follow-up PR. T-Box OWL 2 RL still runs via RDFLib upstream of the store, which produced 99 inferred triples on the demo.Neo4jsub-page with URI/database/auth/credentials formTriple Store → Bolt (UNWIND·MERGE) → Graph DB (Neo4j), mirroring the Lakeflow Sync card on LakebaseLakebase remains the default; existing Lakebase deployments are unaffected.
Architecture decisions
(:sanitised_table_name) {subject, predicate, object}nodes. The original idea of a:Triple:<store>compound label was abandoned because Neo4j 5+ rejects compound labels inCREATE CONSTRAINT.execute_queryraisesNotImplementedError. All writes go through the ontology-validated build pipeline — preserves the C2 safeguard ("l'entrée se fait par l'ontologie", Benoit 20/05).sync_to_remote/sync_from_remote/local_pathare no-ops.engine_configkeys:uri,database,auth_method(basic/databricks_secret),credentials,encrypted.supports_graph_model=False). The native property-graph mode (Neo4jGraphStore,supports_graph_model=True) is the natural v2 backend — typed-node graph model is deferred. This is why the Neo4j Browser shows 303 nodes and 0 relationships — by design, not a bug.Behind the scenes — what the GraphQL resolver emits
For
{ pfascompounds { id label } }the resolver calls two named methods onNeo4jStore, each emitting a parameterised Cypher statement (no string interpolation):The only way Neo4j gets touched is through these 16 named methods —
execute_queryraisesNotImplementedError. C2 is enforced in code, not just in docs. Zero injection surface (all values bound).Bugs found and fixed in the same PR
triplestore_page_contexttautology —_raw if _raw == "lakebase" else "lakebase"silently coerced every non-Lakebase engine to lakebase, hard-stuck theGraph DB (...)label on Lakebase. Replaced with a direct pass-through.CREATE CONSTRAINT— Neo4j 5+ rejects compound:Triple:<store>labels. Switched to single backtick-quoted label.app.yaml.template'suv runlacked--extra neo4j. Added.Open questions for @benoitcayladbx
execute_query→NotImplementedError. Aligned with the "l'entrée se fait par l'ontologie" rule from 20/05?Test plan — all green ✅
python3 -m py_compileon every changed.py— OKnode --checkonsettings.js/query-sync.js/domain-validation.js— OKmake bundle-validateondev-lakebasetarget — cleanmake deploytofevm-mjolnir— exit 0, apps RUNNINGtests/integration/neo4j_e2e_smoke.py— 9 / 9 assertions passGET /settings/graph-enginereturnsneo4j, config hasuri/db/auth/passwordpfascompounds + facilities + treatmentprocesses) returns the expected entities from Neo4jSmoke-test artefact (committed):
tests/integration/neo4j_e2e_smoke.py— runnable by any contributor withneo4j>=5.0and the Aura creds file.cc @benoitcayladbx — branch is ready-for-review on the code AND on visible proof.
Effort estimate
For benchmarking future v0.x backend slots. Honest, triangulated from commit timestamps + session memory — not stopwatch.
Neo4jStore.py~580 LoC + factory dispatch + reasoning scaffold + smoke test): ~8 – 11 h--extra neo4j,triplestore_page_contexttautology + 4 JS reconciliation fixes): ~2 – 3 hGraphDBBackendabstraction already indevelop, and (b) heavy use of the Databricks-native agent loop for context-switching, deploy-waiting, and live UI verification.