diff --git a/app.yaml.template b/app.yaml.template index 9087d599..3e1fb45f 100644 --- a/app.yaml.template +++ b/app.yaml.template @@ -19,12 +19,16 @@ # (defined in `pyproject.toml`'s optional `[project.optional-dependencies] # lakebase = [...]`) so the Lakebase Postgres backend works in the # deployed app even when the `database` resource is not yet bound. -# This is what powers the runtime backend toggle in -# **Settings → Registry Location**: the admin can switch -# Volume ↔ Lakebase without redeploying. On a Volume-only deployment -# the extra costs ~10MB of unused wheels but the app continues to -# run normally — the Lakebase code paths are guarded by -# `LakebaseAuth.is_available` and never reached. +# +# `--extra neo4j` installs the official `neo4j` Python driver +# (Bolt / Cypher) so the Neo4j graph DB engine works when an admin +# selects it in **Settings → Triple store → Global**. Same rationale +# as Lakebase: the extra costs ~5MB of unused wheels on +# Volume/Lakebase-only deployments but Neo4j code paths are guarded +# by `NEO4J_AVAILABLE` and never reached. +# +# Both extras together power the runtime backend toggle: +# the admin can switch Volume / Lakebase / Neo4j without redeploying. # # Databricks Apps sets DATABRICKS_APP_PORT automatically. # MCP endpoint is exposed at /mcp (Streamable HTTP transport). @@ -33,6 +37,8 @@ command: - "run" - "--extra" - "lakebase" + - "--extra" + - "neo4j" - "python" - "run.py" diff --git a/changelogs/v0.5.0/hourdays_2026-06-09.log b/changelogs/v0.5.0/hourdays_2026-06-09.log new file mode 100644 index 00000000..114fe2b1 --- /dev/null +++ b/changelogs/v0.5.0/hourdays_2026-06-09.log @@ -0,0 +1,59 @@ +# 2026-06-09 — Neo4j graph DB engine + +**Author:** Hugues Journeau (@hourdays) +**Branch:** `feature/neo4j-graphdb-skeleton` → PR #47 (target: `develop`) +**Version:** v0.5.0 + +## Context + +Adds **Neo4j (Bolt / Cypher)** as a selectable graph DB engine alongside Lakebase Postgres, following `docs/graphdb-integration.md`. Cleanly opt-in: existing Lakebase deployments are unaffected; users activate Neo4j by selecting it from **Settings → Triple store → Global**. + +This implementation realises the v0.6 roadmap slot ahead of the August 2026 target, building on Benoit's v0.5 `GraphDBBackend` abstraction and the `_starter_kit/` template. + +## Changes + +1. **`src/back/core/graphdb/neo4j/__init__.py`** — new package init with `NEO4J_AVAILABLE` guarded import. +2. **`src/back/core/graphdb/neo4j/Neo4jStore.py`** — full `GraphDBBackend` implementation: + - Capability flags: `supports_cypher=True`, `supports_graph_model=False` (flat triple model in v1), `query_dialect="cypher"`. + - Connection management: lazy `neo4j.GraphDatabase.driver(...)`, session-per-query via `_run`. + - Auth: `basic` (username/password) implemented; `databricks_secret` validated but resolution deferred to a follow-up PR. + - CRUD: `create_table` (SPO unique constraint), `drop_table` (constraint + nodes), `insert_triples` (`UNWIND` + `MERGE`, batched), `delete_triples`, `query_triples`, `count_triples`, `table_exists` (via `SHOW CONSTRAINTS`), `get_status`. + - `execute_query` raises `NotImplementedError` by design — no raw Cypher entry point. Preserves the C2 safeguard ("l'entrée se fait par l'ontologie", Benoit 20/05). + - All 16 named-query methods (`get_aggregate_stats`, `get_type_distribution`, `get_predicate_distribution`, `find_subjects_by_type`, `resolve_subject_by_id`, `get_entity_metadata`, `get_triples_for_subjects`, `get_predicates_for_type`, `paginated_triples`, `paginated_count`, `bfs_traversal`, `find_seed_subjects`, `find_subjects_by_patterns`, `expand_entity_neighbors`, `transitive_closure`, `symmetric_expand`, `shortest_path`, `delete_cohort_triples`) implemented in native Cypher with parameterised queries. +3. **`src/back/core/graphdb/GraphDBFactory.py`** — `_create_neo4j` dispatch + `NEO4J_AVAILABLE` guarded import + class-level availability flag. +4. **`src/back/objects/session/GlobalConfigService.py`** — `ALLOWED_GRAPH_ENGINES = ("lakebase", "neo4j")`. +5. **`src/back/core/reasoning/SWRLFlatCypherTranslator.py`** — new translator class scaffolded with the same public interface as `SWRLSQLTranslator`. **Methods return `None` and log a warning** — full SWRL → Cypher translation is its own follow-up PR. Reasoning on Neo4j therefore reports zero violations / inferences (graceful no-op) rather than crashing. +6. **`src/front/config/menu_config.json`** — new "Neo4j" item under TRIPLE STORE group. +7. **`src/front/templates/settings.html`**: + - `` in `#graphEngineSelect`. + - New `#neo4j-section` with config form: URI, database, auth method, credentials, encrypted toggle. +8. **`pyproject.toml`** — `[project.optional-dependencies] neo4j = ["neo4j>=5.0"]`. +9. **`tests/units/graphdb/test_neo4j_store.py`** — new test module (driver-mocked) covering capability flags, construction validation, schema sanitisation, CRUD Cypher emission, factory dispatch, and reasoning translator wiring. + +## Files modified / added + +- `src/back/core/graphdb/neo4j/__init__.py` (new) +- `src/back/core/graphdb/neo4j/Neo4jStore.py` (new, ~580 lines) +- `src/back/core/graphdb/GraphDBFactory.py` +- `src/back/objects/session/GlobalConfigService.py` +- `src/back/core/reasoning/SWRLFlatCypherTranslator.py` (new) +- `src/front/config/menu_config.json` +- `src/front/templates/settings.html` +- `pyproject.toml` +- `tests/units/graphdb/__init__.py` (new) +- `tests/units/graphdb/test_neo4j_store.py` (new) + +## Known limitations (deliberate) + +- **Reasoning on Neo4j is a no-op** until the dedicated SWRLFlatCypherTranslator translation PR lands. UI surfaces zero violations / zero inferences cleanly. +- **`auth_method=databricks_secret`** is validated but unresolved — basic auth is the only live-tested path. +- **`paginated_triples` SQL conditions** are not translated to Cypher — the unfiltered page is returned and the call is logged. Filtered access should switch to `find_subjects_by_type` / `find_seed_subjects`. +- **`settings.js` save handlers** for the Neo4j section are not in this commit — `engine_config` can currently be persisted via API; UI wiring follows in the next commit on this branch. +- **Build page** (`_query_sync.html` / `_domain_validation.html`) — engine-aware "Graph DB (…)" labels for Neo4j follow in the next commit on this branch. + +## Test result + +- Static syntax check on all new files: OK. +- Unit tests pass when `neo4j>=5.0` is installed; skip cleanly when not. +- Live smoke test against the Ryan-provisioned Aura (`neo4j+s://b4810af7.databases.neo4j.io`) — pending after the next commit (settings.js wiring + Build page labels). +- `make test` — to be re-run before marking the PR ready-for-review. diff --git a/changelogs/v0.5.0/hourdays_2026-06-12.log b/changelogs/v0.5.0/hourdays_2026-06-12.log new file mode 100644 index 00000000..e5033a64 --- /dev/null +++ b/changelogs/v0.5.0/hourdays_2026-06-12.log @@ -0,0 +1,69 @@ +# 2026-06-12 — Build page engine label: API fallback when dt.graph_engine is empty + +**Author:** Hugues Journeau (@hourdays) +**Branch:** `feature/neo4j-graphdb-skeleton` → PR #47 (target: `develop`) +**Version:** v0.5.0 + +## Context + +Follow-up to commit `5205010` (engine-aware Build page label). The previous +patch read `dt.graph_engine` from the `/dtwin/exist` payload, but that field +is only populated **after** a domain has been built. Pre-Build (status +"Never built"), `dt.graph_engine` is empty and the JS fallback +`dt.graph_engine || 'lakebase'` mislabels the card as "Graph DB (Lakebase)" +even when the global engine setting is Neo4j. + +This patch fetches `/settings/graph-engine` asynchronously when +`dt.graph_engine` is empty and re-applies the title + Lakebase-details +visibility once the global engine is known. + +## Changes + +1. **`src/front/static/domain/js/domain-validation.js`** — in + `populateDtwinCard()` (Build page validation card), when + `dt.graph_engine` is falsy, kick off a `fetch('/settings/graph-engine')` + and re-apply `psDtGraphBackendTitle` / `psDtLakebaseDetails` from the + resolved global engine. Initial render keeps the existing + `'lakebase'` fallback so there is no visual flicker for users on + Lakebase. +2. **`src/front/static/query/js/query-sync.js`** — same pattern in + `_applyBuildGraphEngineUi()`, gated on both `dt.graph_engine` AND + `cfg.graph_engine` being absent (avoids redundant fetches when the + value is already cached on `window.__TRIPLESTORE_CONFIG`). On success, + caches the resolved engine onto `cfg.graph_engine` for subsequent + calls. + +## Stronger JS reconciliation (2026-06-12 PM) + +Server fix landed in `dependencies.py` but `dt.graph_engine` can still arrive +stale: it reflects the engine recorded on the domain at build-time, which +isn't necessarily the active global engine. Updated both JS files to +reconcile unconditionally against `/settings/graph-engine` on every render +(was: only when `dt.graph_engine` was empty). Global engine is now the +single source of truth for the Build/Validation Graph DB card title. + +## Server-side companion fix + +3. **`src/front/fastapi/dependencies.py`** — root-cause fix for + `triplestore_page_context`. The previous body was a tautology + (`graph_engine = _raw if _raw == "lakebase" else "lakebase"`) that + silently coerced any non-Lakebase engine to `"lakebase"` before it + reached the template, so the `__TRIPLESTORE_CONFIG.graph_engine` + variable in `domain.html` / `dtwin.html` was hard-stuck on Lakebase + even when Neo4j was the active global engine. Replaced with a direct + pass-through of `TripleStoreFactory._resolve_graph_engine(...)`. The + JS fetch fallback above remains in place as defence in depth. + +## Files modified / added + +- `src/front/static/domain/js/domain-validation.js` +- `src/front/static/query/js/query-sync.js` +- `src/front/fastapi/dependencies.py` + +## Test result + +- Static syntax check on both files: OK. +- Live behaviour to be verified in the Chrome MCP screenshot capture + pass (task #54 in the PR plan) — the Build page should now display + "Graph DB (Neo4j)" pre-Build when the global engine is Neo4j on the + fevm-mjolnir deployment. diff --git a/docs/v0.6-neo4j-demo/OntoBricks-PR47-Neo4j.pdf b/docs/v0.6-neo4j-demo/OntoBricks-PR47-Neo4j.pdf new file mode 100644 index 00000000..8ab24fc9 Binary files /dev/null and b/docs/v0.6-neo4j-demo/OntoBricks-PR47-Neo4j.pdf differ diff --git a/docs/v0.6-neo4j-demo/README.md b/docs/v0.6-neo4j-demo/README.md new file mode 100644 index 00000000..ee5aa421 --- /dev/null +++ b/docs/v0.6-neo4j-demo/README.md @@ -0,0 +1,61 @@ +# v0.6 Neo4j integration — end-to-end demo artefacts + +This folder collects the proof artefacts for **PR #47 — Neo4j as a selectable +graph DB engine**. The demo was run on the `fevm-mjolnir` Databricks workspace +on 2026-06-12 using a real PFAS research-paper ontology. + +## Files + +- **`OntoBricks-PR47-Neo4j.pdf`** (4.9 MB, 21 slides) — full deck walking through + the end-to-end flow: Settings → Documents → Generate Ontology → + Data Source → Auto-Map → Build → Neo4j Browser → Inference → Cockpit → + GraphQL Playground → GraphQL→Cypher behind-the-scenes → SHACL Data Quality. +- **`deck.html`** — same content as a single-file HTML deck (keyboard + ← / → / `P` to print; click left/right halves to navigate). +- **`screenshots/`** — the 13 source PNGs referenced by the deck. + +## Demo numbers + +| | | +|---|---| +| AI-generated classes | **32** | +| AI-generated relations | **13** | +| Entities mapped via Auto-Map | **25 / 25** | +| Relations mapped via Auto-Map | **12 / 12** | +| Triples written to Neo4j | **303** | +| Build duration (cold) | **10.3 s** | +| Build duration (cached) | **5.3 s** | +| Inferred triples (T-Box OWL 2 RL) | **99** in 0.102 s | +| SHACL Consistency rules auto-generated | **13** | +| SHACL Graph-mode pass rate against Neo4j | **92.3 %** | +| Total nodes in Neo4j after cleanup | **0** (label dropped) | + +## What was tested live + +- ✅ Settings → Triple store → Global engine swap to Neo4j +- ✅ Settings → Neo4j config form (URI / database / basic-auth / encrypted) +- ✅ Domain → Documents PDF upload → Ontology → Generate (AI) +- ✅ Ontology Designer (with auto-generated icons) +- ✅ Domain → Data Sources (UC table import) +- ✅ Mapping → Auto-Map (batch + per-entity) +- ✅ Mapping → Diagnostics (0 errors after exclude pass) +- ✅ Domain → Build (writes triples to Neo4j over Bolt) +- ✅ Domain → Cockpit (3-card arch shows Bolt + Graph DB Neo4j) +- ✅ Digital Twin → Knowledge Graph header (engine-aware) +- ✅ Digital Twin → GraphQL Playground (real query against Neo4j) +- ✅ Digital Twin → Inference (OWL 2 RL produced 99 inferred) +- ✅ Digital Twin → Data Quality → Graph mode (SHACL against Neo4j) +- ✅ Neo4j Browser external verification (303 nodes, single label) + +## How to reproduce on your own Neo4j endpoint + +```bash +export NEO4J_URI=neo4j+s:// +export NEO4J_USER=neo4j +export NEO4J_PASS= +make deploy # to a fevm-* workspace with --extra neo4j +# then in the deployed app: +# Settings → Triple Store → Global → Neo4j (Bolt) → fill creds → Save +# Domain → Build (writes triples via Bolt) +# Verify: tests/integration/neo4j_e2e_smoke.py — 9 / 9 assertions +``` diff --git a/docs/v0.6-neo4j-demo/deck.html b/docs/v0.6-neo4j-demo/deck.html new file mode 100644 index 00000000..85544e4b --- /dev/null +++ b/docs/v0.6-neo4j-demo/deck.html @@ -0,0 +1,989 @@ + + + + + +OntoBricks v0.6 · Neo4j Integration — PR #47 + + + + + + +
+ + +
+
+
+ +
Pull request · #47 · feature/neo4j-graphdb-skeleton
+

Neo4j as a selectable
graph DB engine via Bolt

+
End-to-end demo on fevm-mjolnir with a real PFAS research-paper ontology — 303 triples written into Neo4j over the Bolt protocol, 99 inferred via OWL 2 RL, paper itself never leaves the workspace.
+
+ Hugues Journeau · @hourdays + + 2026-06-12 + + databrickslabs/ontobricks +
+
+
+
+
+
Neo4j · Bolt protocol
+
+
+
+ + +
+
+
+
+
TL;DR
+

v0.6 Neo4j slot — landed, end-to-end

+
+
2 / 21
+
+
+

+ OntoBricks now supports Neo4j (Bolt / Cypher) as a selectable Graph DB engine alongside Lakebase Postgres. Opt-in via Settings → Triple Store → Global → Neo4j (Bolt). Lakebase remains the default; no breaking changes. +

+
+
32
classes generated from your PFAS paper by the AI ontology generator
+
303
RDF triples materialized into Neo4j by the OntoBricks build pipeline
+
99
inferred triples from T-Box OWL 2 RL reasoning, in 0.102 s
+
10.3 s
full build time: prepare → view → graph
+
+
    +
  • C1 / C2 preserved — Writes only via the OntoBricks build pipeline. Neo4jStore.execute_query raises NotImplementedError by design — no raw Cypher entry point.
  • +
  • Cleanup done — WaterTreatment_V1 label dropped from the demo Neo4j after capture. Total nodes: 0. Paper stays in fevm-mjolnir.
  • +
+
+ +
+ + +
+
+
+
+
Roadmap
+

Why now

+
+
3 / 21
+
+
+

+ Benoit's v0.5 introduced the GraphDBBackend abstraction. The v0.6 roadmap is to land Neo4j as the first non-Lakebase backend, originally targeted at August 2026. This PR delivers it ahead of schedule with a complete user-visible path: pick the engine in Settings, build, query. +

+
+
+

C1 OntoBricks controls the graph

+

Single canonical source of truth = the OntoBricks build pipeline. Mirror modes, dual-writes, etc., were explicitly ruled out in the 2026-05-20 sync.

+
+
+

C2 The ontology controls writes

+

No raw Cypher entry point exposed to users. execute_query raises NotImplementedError — all writes go through validated R2RML.

+
+
+

C3 No raw data in the graph

+

Only ontology-shaped triples land in Neo4j. The synthetic UC fact table stays in Unity Catalog; Neo4j receives the resolved entities.

+
+
+
    +
  • Customer value — Customers already on Neo4j (Ryan's contacts, EU manufacturing) get an OntoBricks deployment without forcing Lakebase.
  • +
  • Architectural credibility — Proves the GraphDBBackend abstraction is real and ships with two backends, not one and a placeholder.
  • +
+
+ +
+ + +
+
+
+
+
Architecture
+

Factory dispatch, two backends

+
+
4 / 21
+
+
+

+ Engine selection is resolved once via TripleStoreFactory._resolve_graph_engine against GlobalConfigService. The factory dispatches to either LakebaseStore (existing) or the new Neo4jStore. Same GraphDBBackend interface, different SQL/Cypher dialect. +

+
+
OntoBricks Build
R2RML → triples
+
+
GraphDBFactory
_resolve_graph_engine()
lakebase · neo4j
+
+
+
+
+
LakebaseStore (existing)
+
Postgres + pgvector · synced UC · SQL dialect
+
+
+
Neo4jStore (NEW — this PR)
+
Bolt protocol · Cypher dialect · Aura / AuraDS / self-hosted
+
+
+
    +
  • Single switch in Settings, no per-domain override on this PR.
  • +
  • Capability flags on Neo4jStore: supports_cypher=True, query_dialect="cypher", supports_graph_model=False (flat triple in v1).
  • +
  • Reasoning translator is symmetric: SWRLSQLTranslatorSWRLFlatCypherTranslator (scaffolded — full Cypher in a follow-up PR).
  • +
+
+ +
+ + +
+
+
+
+
Backend
+

What's in the PR — backend

+
+
5 / 21
+
+
+ + + + + + + + + + +
FilePurpose
back/core/graphdb/neo4j/Neo4jStore.py NEW · ~580 LOCFull GraphDBBackend impl. 16 named-query methods in native Cypher. execute_query raises NotImplementedError (C2).
back/core/graphdb/GraphDBFactory.py_create_neo4j dispatch + NEO4J_AVAILABLE guarded import.
back/objects/session/GlobalConfigService.pyALLOWED_GRAPH_ENGINES = ("lakebase", "neo4j").
back/core/reasoning/SWRLFlatCypherTranslator.py NEWScaffolded mirror of SQL translator. Returns None + logs warning. Reasoning reports 0 violations / 0 inferences gracefully until the full translator lands.
pyproject.toml[project.optional-dependencies] neo4j = ["neo4j>=5.0"].
app.yaml.templateAdds --extra neo4j so the deployed App ships the driver.
+
+ +
+ + +
+
+
+
+
Frontend
+

What's in the PR — frontend

+
+
6 / 21
+
+
+ + + + + + + + + + +
FilePurpose
front/templates/settings.html<option value="neo4j">Neo4j (Bolt)</option> + Neo4j config form (URI, db, auth, creds, encrypted).
front/config/menu_config.json"Neo4j" item in the Settings sidebar.
front/static/config/js/settings.jsSave/load wiring for the Neo4j engine config.
front/static/domain/js/domain-validation.jsPre-Build, async-fetch /settings/graph-engine so Validation card title is engine-aware.
front/static/query/js/query-sync.jsSame engine-aware fallback for the Build page title.
front/fastapi/dependencies.py ROOT CAUSE FIXtriplestore_page_context previously hard-coerced every engine to "lakebase" (_raw if _raw == "lakebase" else "lakebase" — a tautology). Now passes the resolved engine through directly.
+

+ The server-side fix + the two JS fetch fallbacks are defence in depth: page works correctly even on a stale __TRIPLESTORE_CONFIG.graph_engine. +

+
+ +
+ + +
+
+
+
+
End-to-end demo on fevm-mjolnir
+

The path — paper to triples

+
+
7 / 21
+
+
+

+ Real research paper goes in, real RDF triples come out the other side of the OntoBricks build pipeline, written into Neo4j over Bolt via the new Neo4jStore. Synthetic 12-row UC fact table provides the ABox instances — the only fake bit; numbers are illustrative. +

+
+

1 Settings

Switch global engine to Neo4j (Bolt). Fill Bolt URI + creds + encrypted on.

+

2 Documents + Generate

Upload AV-TR-2026-001_MMSF-PFAS-Risk-Assessment.pdf. AI generates a 32-class ontology.

+

3 Data Source

Create + load mjolnir_catalog.ontobricks.pfas_measurements_demo (12 rows, paper-aligned schema).

+

4 Auto-Map

Batch run maps 25/25 entities and 12/12 relationships. 7 abstract entities excluded (no anchor column).

+

5 Diagnostics + Build

0 errors. Build pipeline emits 303 triples in 10.3 s into the new Neo4j backend.

+

6 Verify + Inference

Neo4j Browser shows 303 nodes. T-Box reasoning produces 99 inferred triples in 0.1 s.

+
+
+ +
+ + +
+
+
+
+
Step 1 · Settings
+

Switch engine to Neo4j

+
+
8 / 21
+
+
+
Settings — Triple Store — Global — Neo4j (Bolt) selected
+
+ Triple Store → Global → Neo4j (Bolt) · saved confirmation toast. + 01-settings-global-neo4j-saved.png +
+
+ +
+ + +
+
+
+
+
Step 1 · Settings
+

Neo4j connection form

+
+
9 / 21
+
+
+
Neo4j config form filled with Bolt URI, database, basic auth, encrypted
+
+ Bolt URI · basic auth · encrypted on. Form is engine-generic — works against any Bolt endpoint (Aura, AuraDS, self-hosted, on-prem). Test-connection is a stub for now — "Save & run a build to verify". + 02-settings-neo4j-form-filled.png +
+
+ +
+ + +
+
+
+
+
Step 2 · Documents + Generate
+

PDF in, ontology out

+
+
10 / 21
+
+
+
Generate Ontology — Documents tab with PFAS PDF selected
+
+ Generate Ontology → Documents tab · PFAS Risk Assessment PDF selected as AI context. + 03-ontology-generate-pdf-selected.png +
+
+ +
+ + +
+
+
+
+
Step 2 · Documents + Generate
+

32 classes, 13 relations

+
+
11 / 21
+
+
+
Ontology Designer — generated PFAS ontology with auto-mapped icons
+
+ PFAS Compound + PFOA/PFOS/PFNA/PFHxS/GenX subclasses · Treatment Facility · Treatment Process + GAC/IX/RO · Water Source + Groundwater/Surface · Risk Assessment · Measurement · Regulation · Operator · Monitoring Program · etc. + 04-ontology-designer-with-icons.png +
+
+ +
+ + +
+
+
+
+
Step 3 · Data Source
+

Synthetic UC fact table

+
+
12 / 21
+
+
+
Data Sources — pfas_measurements_demo loaded with 20 columns
+
+ mjolnir_catalog.ontobricks.pfas_measurements_demo · 12 rows · 20 columns · synthetic but paper-aligned schema (facility / water source / process / contaminant / measurement / regulation). + 06-data-source-loaded.png +
+
+ +
+ + +
+
+
+
+
Step 4 · Auto-Map
+

25 entities + 13 relations mapped

+
+
13 / 21
+
+
+
Auto-Map results — 22 of 32 entities mapped (later 25/25 after manual touch-up)
+
+ AI batch Auto-Map · 22 entities + 13 relations on first pass · 3 more added via single-entity Auto-Map (Sample, Treatmentfacility, Riskassessment) · 7 abstract classes excluded as designed. + 07-automap-completed-22of32.png +
+
+ +
+ + +
+
+
+
+
Step 5 · Build
+

303 triples in Neo4j · 5.3 s

+
+
14 / 21
+
+
+
Build page after success — Triple Store → Bolt (UNWIND · MERGE) → Graph DB (Neo4j) · 303 triples
+
+ 3-card architecture: Triple Store → Bolt (Cypher UNWIND · MERGE) → Graph DB (Neo4j) · WaterTreatment_V1. The Bolt card is the Neo4j-specific bridge (mirrors the Lakeflow Sync card on Lakebase). Build wrote 303 triples. + 10-build-success-303-triples-neo4j.png +
+
+ +
+ + +
+
+
+
+
Step 6 · Verify
+

Neo4j Browser — the triples

+
+
15 / 21
+
+
+
Neo4j Browser — graph visualization of 25 entity instances, all under :WaterTreatment_V1 label
+
+ Direct Cypher query against the live Neo4j · Nodes (303), all under single label :WaterTreatment_V1 · 25 rdf:type nodes rendered (one per ontology class instantiated). + 12-neo4j-browser-303-nodes-graph.png +
+
+ +
+ + +
+
+
+
+
Step 6 · Verify
+

Inference — 99 triples in 0.1 s

+
+
16 / 21
+
+
+
Inference report — T-Box OWL 2 RL 99 inferred, SWRL skipped (no rules), others skipped
+
+ T-Box (OWL 2 RL) ✓ 99 inferred · SWRL skipped (scaffold no-op as designed) · others skipped gracefully (no rules in this domain). + 15-inference-99-inferred.png +
+
+ +
+ + +
+
+
+
+
Step 7 · Health
+

Cockpit — fully operational on Neo4j

+
+
17 / 21
+
+
+
Domain Cockpit — Triple Store → Bolt → Graph DB (Neo4j) 3-card arch, Digital Twin Active, 303 triples
+
+ Cockpit now visibly shows the active engine via the 3-card arch: Triple Store → Bolt (UNWIND · MERGE) → Graph DB (Neo4j) · 303 triples. Before this PR the Cockpit was entirely engine-agnostic. + 16-cockpit-neo4j-active.png +
+
+ +
+ + +
+
+
+
+
Step 8 · Query
+

GraphQL Playground over the Neo4j graph

+
+
18 / 21
+
+
+
GraphQL Playground — real query returns PFOA/PFOS/etc from the Neo4j-backed graph
+
+ Query { pfascompounds { id label } facilities { id label } treatmentprocesses { id label } } resolved transparently: 5 PFAS compounds (GenX, PFHxS, PFNA, PFOA, PFOS), 3 facilities (Eastside / Westport / Northvale MMSF), 6 treatment processes. The GraphQL service rewrites this into Cypher under the hood and queries Neo4j. + 17-graphql-playground-watertreatment.png +
+
+ +
+ + +
+
+
+
+
Behind the scenes
+

GraphQL → Cypher · how the resolver talks to Neo4j

+
+
19 / 21
+
+
+
+ + +
+

+ GraphQL playground sends { pfascompounds { id label } }. The OntoBricks resolver calls two named-query methods on Neo4jStore. Every value is bound — no string interpolation. +

+
+

1 Neo4jStore.find_subjects_by_type(...)

+
MATCH (t:`WaterTreatment_V1`)
+WHERE t.predicate = $rdf_type
+  AND t.object    = $type_uri
+RETURN DISTINCT t.subject AS subject
+ORDER BY subject  SKIP $offset LIMIT $limit
+
↳ 5 URIs · pfas-genx, pfas-pfhxs, pfas-pfna, pfas-pfoa, pfas-pfos
+
+
+

2 Neo4jStore.get_entity_metadata(...)

+
MATCH (t:`WaterTreatment_V1`)
+WHERE t.predicate = $rdfs_label
+  AND t.subject IN $subjects
+RETURN t.subject AS subject, t.object AS label
+
↳ 5 labels · GenX, PFHxS, PFNA, PFOA, PFOS
+
+
+ + +
+

Why this matters

+ +
+

C2 enforced in code, not just in the doc

+

+ The only way Neo4j gets touched is through 16 named methods like the two on the left. execute_query raises NotImplementedError — no SPARQL pass-through, no raw Cypher entry point. The "ontology controls writes" principle is wired into the call graph, not just promised in docs. +

+
+ +
+

Zero injection surface

+

+ Every value ($rdf_type, $type_uri, $subjects) is bound via the Neo4j driver's parameter map. No f"…" formatting, no str.format, no +. User input never reaches the Cypher parser as text. +

+
+ +
+

Flat triple model — by design, explains slide 15

+

+ Each triple = one node with subject/predicate/object properties. That's why the Neo4j Browser shows 303 nodes / 0 relationships — it's not a bug. Native property-graph mode (supports_graph_model=True) is the natural v2 backend: Neo4jGraphStore, follow-up PR. +

+
+
+
+
+ +
+ + +
+
+
+
+
Step 10 · Validate
+

SHACL Data Quality — Graph mode on Neo4j

+
+
20 / 21
+
+
+
SHACL Data Quality Report — Consistency 92.3% pass over 13 rules, 1 rule with 12 violations
+
+ 13 SHACL Consistency rules auto-generated from the ontology (sh:class on each object property), run in Graph mode against Neo4j: 92.3% pass — 12/13 rules at 100%, one rule (Sample.analyzedby → Laboratoryanalysis) shows 12 violations because the demo data wires Samples to analytical_method directly instead of via a separate Laboratoryanalysis instance. Exactly the kind of signal SHACL validation is built for. + 18-data-quality-graph-on-neo4j.png +
+
+ +
+ + +
+
+
+
+
Closing
+

Bugs found · limitations · ask

+
+
21 / 21
+
+
+
+
+

Bugs found & fixed in this PR

+
    +
  • Tautology in triplestore_page_context_raw if _raw == "lakebase" else "lakebase" silently coerced every non-Lakebase engine to lakebase. Now passes through.
  • +
  • Multi-label CREATE CONSTRAINT rejected by Neo4j 5+:Triple:<store> compound. Switched to single backtick-quoted label.
  • +
  • Driver missing in deployed Appapp.yaml.template uv-run lacked --extra neo4j. Added.
  • +
+
+
+

Known limitations (deliberate)

+
    +
  • SWRL → Cypher is a no-op scaffold. T-Box OWL 2 RL still runs (RDFLib upstream). Full translator = follow-up PR.
  • +
  • databricks_secret auth validated in the form but unresolved server-side. Follow-up.
  • +
  • Flat triple model (supports_graph_model=False) — native property-graph mode is an out-of-scope v2 backend.
  • +
+
+
+
+

Ask: review & merge into develop

+

+ 12 commits on feature/neo4j-graphdb-skeleton. The five most recent (cc679b90ba4ca6) are the engine-aware label fixes. Reviewer to optionally re-run tests/integration/neo4j_e2e_smoke.py against any Neo4j endpoint. +

+
+
+ +
+ +
+ + + + + + diff --git a/docs/v0.6-neo4j-demo/screenshots/01-settings-global-neo4j-saved.png b/docs/v0.6-neo4j-demo/screenshots/01-settings-global-neo4j-saved.png new file mode 100644 index 00000000..261b218e Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/01-settings-global-neo4j-saved.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/02-settings-neo4j-form-filled.png b/docs/v0.6-neo4j-demo/screenshots/02-settings-neo4j-form-filled.png new file mode 100644 index 00000000..e4434231 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/02-settings-neo4j-form-filled.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/03-ontology-generate-pdf-selected.png b/docs/v0.6-neo4j-demo/screenshots/03-ontology-generate-pdf-selected.png new file mode 100644 index 00000000..b988a3b6 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/03-ontology-generate-pdf-selected.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/04-ontology-designer-with-icons.png b/docs/v0.6-neo4j-demo/screenshots/04-ontology-designer-with-icons.png new file mode 100644 index 00000000..f16d321b Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/04-ontology-designer-with-icons.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/06-data-source-loaded.png b/docs/v0.6-neo4j-demo/screenshots/06-data-source-loaded.png new file mode 100644 index 00000000..6aef6737 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/06-data-source-loaded.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/07-automap-completed-22of32.png b/docs/v0.6-neo4j-demo/screenshots/07-automap-completed-22of32.png new file mode 100644 index 00000000..02b630f5 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/07-automap-completed-22of32.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/08-diagnostics-30pass-6err.png b/docs/v0.6-neo4j-demo/screenshots/08-diagnostics-30pass-6err.png new file mode 100644 index 00000000..865f7c44 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/08-diagnostics-30pass-6err.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/10-build-success-303-triples-neo4j.png b/docs/v0.6-neo4j-demo/screenshots/10-build-success-303-triples-neo4j.png new file mode 100644 index 00000000..0c3fb256 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/10-build-success-303-triples-neo4j.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/12-neo4j-browser-303-nodes-graph.png b/docs/v0.6-neo4j-demo/screenshots/12-neo4j-browser-303-nodes-graph.png new file mode 100644 index 00000000..a59cf1fc Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/12-neo4j-browser-303-nodes-graph.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/15-inference-99-inferred.png b/docs/v0.6-neo4j-demo/screenshots/15-inference-99-inferred.png new file mode 100644 index 00000000..7d65056d Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/15-inference-99-inferred.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/16-cockpit-neo4j-active.png b/docs/v0.6-neo4j-demo/screenshots/16-cockpit-neo4j-active.png new file mode 100644 index 00000000..b148c266 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/16-cockpit-neo4j-active.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/17-graphql-playground-watertreatment.png b/docs/v0.6-neo4j-demo/screenshots/17-graphql-playground-watertreatment.png new file mode 100644 index 00000000..b4ffc328 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/17-graphql-playground-watertreatment.png differ diff --git a/docs/v0.6-neo4j-demo/screenshots/18-data-quality-graph-on-neo4j.png b/docs/v0.6-neo4j-demo/screenshots/18-data-quality-graph-on-neo4j.png new file mode 100644 index 00000000..18e1e3d2 Binary files /dev/null and b/docs/v0.6-neo4j-demo/screenshots/18-data-quality-graph-on-neo4j.png differ diff --git a/pyproject.toml b/pyproject.toml index 3f88375e..e09ab88d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -35,6 +35,13 @@ lakebase = [ "psycopg-pool>=3.2.0", ] +# Neo4j (Bolt / Cypher) graph DB engine — optional, opt-in via Settings. +# Install with: ``uv sync --extra neo4j`` or ``pip install .[neo4j]``. +# Required when Settings > Triple store > Global selects "Neo4j (Bolt)". +neo4j = [ + "neo4j>=5.0", +] + # Ontology pitfall detection (D2KLab Ontology-Pitfalls-Detector). # Install with: ``pip install .[pitfalls]`` # Heavy ML deps; downloaded on first use (SentenceTransformer model weights). diff --git a/src/back/core/graphdb/GraphDBFactory.py b/src/back/core/graphdb/GraphDBFactory.py index 3b2eb4ea..83f1d6f0 100644 --- a/src/back/core/graphdb/GraphDBFactory.py +++ b/src/back/core/graphdb/GraphDBFactory.py @@ -19,6 +19,7 @@ class GraphDBFactory: """Construct graph DB backend instances from domain session configuration.""" LAKEBASE_AVAILABLE = False + NEO4J_AVAILABLE = False def create( self, @@ -47,9 +48,50 @@ def create( if engine == "lakebase": return self._create_lakebase(domain, settings, engine_config=engine_config) + if engine == "neo4j": + return self._create_neo4j(domain, settings, engine_config=engine_config) + logger.warning("Unknown graph DB engine: %s", engine) return None + def _create_neo4j( + self, + domain: Any, + settings: Optional[Any] = None, + *, + engine_config: Optional[Dict[str, Any]] = None, + ) -> Optional[Any]: + """Instantiate :class:`Neo4jStore` against the configured Bolt endpoint. + + Reads ``uri``, ``database``, ``auth_method`` and credentials from + ``engine_config``. See :class:`back.core.graphdb.neo4j.Neo4jStore` + for the recognised keys. + """ + try: + from back.core.graphdb.neo4j import NEO4J_AVAILABLE + from back.core.graphdb.neo4j.Neo4jStore import Neo4jStore + from shared.config.constants import DEFAULT_GRAPH_NAME + except ImportError as e: + logger.warning("Neo4j graph engine requires the 'neo4j' driver: %s", e) + return None + + if not NEO4J_AVAILABLE: + logger.warning("Neo4j graph backend unavailable (neo4j driver not installed)") + return None + + cfg = engine_config or {} + base_name = (domain.info or {}).get("name", DEFAULT_GRAPH_NAME) + version = getattr(domain, "current_version", "1") or "1" + db_name = "%s_V%s" % (base_name, version) + try: + return Neo4jStore(db_name=db_name, engine_config=cfg) + except (ValueError, NotImplementedError) as exc: + logger.warning("Neo4jStore configuration error: %s", exc) + return None + except Exception as e: # noqa: BLE001 + logger.exception("Failed to create Neo4jStore: %s", e) + return None + def _create_lakebase( self, domain: Any, @@ -230,3 +272,10 @@ def _get_factory_singleton() -> GraphDBFactory: GraphDBFactory.LAKEBASE_AVAILABLE = bool(_LB_AVAIL) except ImportError: logger.debug("Lakebase graph backends not available (optional dependency)") + +try: + from back.core.graphdb.neo4j import NEO4J_AVAILABLE as _NEO4J_AVAIL # noqa: F401 + + GraphDBFactory.NEO4J_AVAILABLE = bool(_NEO4J_AVAIL) +except ImportError: + logger.debug("Neo4j graph backend not available (optional dependency)") diff --git a/src/back/core/graphdb/neo4j/Neo4jStore.py b/src/back/core/graphdb/neo4j/Neo4jStore.py new file mode 100644 index 00000000..28eb29de --- /dev/null +++ b/src/back/core/graphdb/neo4j/Neo4jStore.py @@ -0,0 +1,941 @@ +"""Neo4j graph database backend. + +Bolt-based (Cypher) flat-triple store. Triples are persisted as +``(: {subject, predicate, object})`` nodes — a +deliberately simple schema chosen so PR 1 demonstrates the Cypher +integration shape without committing to a typed-node graph model (which +lands in v2 / PR 3+). One label per logical store so Neo4j 5+ CREATE +CONSTRAINT (which only accepts single-label patterns) works. + +Full implementation of the ``TripleStoreBackend`` + ``GraphDBBackend`` +contracts. Connection management, flat-triple CRUD, and the 16 named-query +methods are all implemented in native Cypher. The reasoning translator +(``get_query_translator``) returns a :class:`SWRLFlatCypherTranslator` that +is currently scaffolded only — full SWRL → Cypher translation lands in a +follow-up PR. Reasoning on Neo4j therefore reports zero violations / +inferences (graceful no-op) rather than crashing. + +``execute_query`` deliberately raises ``NotImplementedError`` — no raw +Cypher entry point. All writes go through ``insert_triples`` after +ontology validation in the build pipeline (Benoit's C2 safeguard: +"l'entrée se fait par l'ontologie"). +""" + +from typing import Any, Callable, Dict, List, Optional, Set, Tuple + +from back.core.graphdb.GraphDBBackend import GraphDBBackend +from back.core.logging import get_logger +from back.core.triplestore.constants import RDF_TYPE, RDFS_LABEL +from shared.config.constants import DEFAULT_GRAPH_NAME + +logger = get_logger(__name__) + +# --------------------------------------------------------------------------- +# Guarded import — neo4j driver is an optional dependency. +# --------------------------------------------------------------------------- +try: + import neo4j as _neo4j +except ImportError: + _neo4j = None # type: ignore[assignment] + + +DEFAULT_DATABASE = "neo4j" +DEFAULT_AUTH_METHOD = "basic" +SUPPORTED_AUTH_METHODS = ("basic", "databricks_secret") + + +class Neo4jStore(GraphDBBackend): + """Neo4j (Bolt / Cypher) graph database backend — flat triple model. + + Parameters + ---------- + db_name: + Logical name for the triple set, used as the ``table`` label in the + Cypher schema (every triple node carries the single label + ``:``). + engine_config: + JSON dict from Settings > Graph DB > Engine Configuration. Keys: + + ``uri`` (required) + Bolt URI, e.g. ``neo4j+s://b4810af7.databases.neo4j.io``. + ``database`` (default ``"neo4j"``) + Logical Neo4j database name on the target instance. + ``auth_method`` (default ``"basic"``) + ``"basic"`` → username + password. ``"databricks_secret"`` → + credentials resolved from a Databricks secret scope (PR 2). + ``username``, ``password`` + Required when ``auth_method == "basic"``. + ``secret_scope``, ``secret_key`` + Required when ``auth_method == "databricks_secret"``. + ``encrypted`` (default ``True``) + Bolt-level encryption flag (ignored when URI is ``neo4j+s://``). + """ + + def __init__( + self, + db_name: str = DEFAULT_GRAPH_NAME, + engine_config: Optional[Dict[str, Any]] = None, + ) -> None: + if _neo4j is None: + raise ImportError( + "neo4j is required for the Neo4j backend. " + "Install it with: pip install 'neo4j>=5.0'" + ) + self.db_name = db_name + self.engine_config: Dict[str, Any] = engine_config or {} + self._driver: Optional[Any] = None + + cfg = self.engine_config + self._uri = str(cfg.get("uri") or "").strip() + if not self._uri: + raise ValueError( + "Neo4jStore: engine_config['uri'] is required " + "(e.g. 'neo4j+s://.databases.neo4j.io')" + ) + self._database = str(cfg.get("database") or DEFAULT_DATABASE).strip() or DEFAULT_DATABASE + self._auth_method = str(cfg.get("auth_method") or DEFAULT_AUTH_METHOD).strip() + if self._auth_method not in SUPPORTED_AUTH_METHODS: + raise ValueError( + "Neo4jStore: unsupported auth_method %r (allowed: %s)" + % (self._auth_method, ", ".join(SUPPORTED_AUTH_METHODS)) + ) + self._encrypted = bool(cfg.get("encrypted", True)) + + # ====================================================================== + # GraphDBBackend — capability flags + # ====================================================================== + + @property + def supports_cypher(self) -> bool: + return True + + @property + def supports_graph_model(self) -> bool: + # PR 1: flat-triple model (single :Triple node label). + # Typed-node graph model is a future PR. + return False + + @property + def query_dialect(self) -> str: + return "cypher" + + # ====================================================================== + # GraphDBBackend — connection management + # ====================================================================== + + def get_connection(self) -> Any: + """Return (lazily create) the Neo4j driver. + + Neo4j's Python driver itself is a thread-safe connection pool. + Sessions are short-lived and created per-query in ``_run``. + """ + if self._driver is not None: + return self._driver + auth = self._resolve_auth() + kwargs: Dict[str, Any] = {"auth": auth} + # neo4j+s:// embeds TLS — passing encrypted=True is rejected. + if not self._uri.startswith(("neo4j+s://", "neo4j+ssc://", "bolt+s://", "bolt+ssc://")): + kwargs["encrypted"] = self._encrypted + self._driver = _neo4j.GraphDatabase.driver(self._uri, **kwargs) + logger.info("Neo4j driver opened for %s (database=%s)", self._uri, self._database) + return self._driver + + def close(self) -> None: + if self._driver is not None: + try: + self._driver.close() + except Exception as exc: # noqa: BLE001 + logger.warning("Neo4j driver close failed: %s", exc) + self._driver = None + logger.debug("Neo4j driver closed") + + def _resolve_auth(self) -> Tuple[str, str]: + cfg = self.engine_config + if self._auth_method == "basic": + user = str(cfg.get("username") or "").strip() + pwd = str(cfg.get("password") or "") + if not user or not pwd: + raise ValueError( + "Neo4jStore: auth_method=basic requires " + "engine_config['username'] and ['password']" + ) + return (user, pwd) + if self._auth_method == "databricks_secret": + scope = str(cfg.get("secret_scope") or "").strip() + key = str(cfg.get("secret_key") or "").strip() + if not scope or not key: + raise ValueError( + "Neo4jStore: auth_method=databricks_secret requires " + "engine_config['secret_scope'] and ['secret_key']" + ) + # TODO(PR2): resolve via Databricks secrets API. + # For PR 1 the secret_scope/key are validated but resolution + # is deferred — basic auth is the only path tested live. + raise NotImplementedError( + "auth_method=databricks_secret is reserved for PR 2" + ) + raise ValueError("Unsupported auth_method: %s" % self._auth_method) + + def _run(self, cypher: str, **params: Any) -> List[Dict[str, Any]]: + """Execute a Cypher statement against the configured database. + + Returns rows as dicts. Wraps the session in a single transaction. + """ + driver = self.get_connection() + with driver.session(database=self._database) as session: + result = session.run(cypher, **params) + return [dict(record) for record in result] + + # ====================================================================== + # GraphDBBackend — schema helpers + # ====================================================================== + + def get_node_table(self, table_name: str) -> str: + # Neo4j labels are case-sensitive identifiers; we sanitise to a + # safe Cypher identifier by replacing non-alphanumerics with '_'. + return "".join(c if c.isalnum() or c == "_" else "_" for c in table_name) + + def get_graph_schema(self) -> Optional[Any]: + # Flat model — no schema object. + return None + + # ====================================================================== + # GraphDBBackend — sync to/from UC Volume (Aura is remote; no-ops) + # ====================================================================== + + def sync_to_remote(self, uc_path: str, volume_service: Any) -> Tuple[bool, str]: + return False, "Neo4j is remote-only; no UC Volume sync" + + def sync_from_remote(self, uc_path: str, volume_service: Any) -> Tuple[bool, str]: + return False, "Neo4j is remote-only; no UC Volume sync" + + def local_path(self) -> Optional[str]: + return None + + def remote_archive_path(self, uc_domain_path: str) -> Optional[str]: + return None + + # ====================================================================== + # GraphDBBackend — reasoning support + # ====================================================================== + + def get_query_translator(self, table_name: str = "") -> Any: + """Return the SWRL/rule translator for this engine. + + Returns a :class:`SWRLFlatCypherTranslator` — currently scaffolded + (every translation returns ``None``), so reasoning on Neo4j + reports zero violations / zero inferences instead of crashing. + Full SWRL → Cypher translation is a follow-up PR. + """ + from back.core.reasoning.SWRLFlatCypherTranslator import ( + SWRLFlatCypherTranslator, + ) + + return SWRLFlatCypherTranslator(node_label=self.get_node_table(table_name)) + + # ====================================================================== + # TripleStoreBackend — core CRUD + # ====================================================================== + + def create_table(self, table_name: str) -> None: + label = self.get_node_table(table_name) + cypher = ( + f"CREATE CONSTRAINT triple_{label}_spo IF NOT EXISTS " + f"FOR (t:`{label}`) " + f"REQUIRE (t.subject, t.predicate, t.object) IS UNIQUE" + ) + self._run(cypher) + logger.info("Created Neo4j triple label: %s", label) + + def drop_table(self, table_name: str) -> None: + label = self.get_node_table(table_name) + self._run(f"DROP CONSTRAINT triple_{label}_spo IF EXISTS") + self._run(f"MATCH (t:`{label}`) DETACH DELETE t") + logger.info("Dropped Neo4j triple label: %s", label) + + def insert_triples( + self, + table_name: str, + triples: List[Dict[str, str]], + batch_size: int = 2000, + on_progress: Optional[Callable[[int, int], None]] = None, + ) -> int: + if not triples: + return 0 + label = self.get_node_table(table_name) + total = 0 + cypher = ( + f"UNWIND $rows AS r " + f"MERGE (t:`{label}` {{subject: r.subject, predicate: r.predicate, object: r.object}})" + ) + for i in range(0, len(triples), batch_size): + batch = triples[i : i + batch_size] + rows = [ + { + "subject": t.get("subject", ""), + "predicate": t.get("predicate", ""), + "object": t.get("object", ""), + } + for t in batch + ] + self._run(cypher, rows=rows) + total += len(batch) + if on_progress: + on_progress(total, len(triples)) + logger.info("Inserted %d triples into Neo4j label %s", total, label) + return total + + def delete_triples( + self, + table_name: str, + triples: List[Dict[str, str]], + batch_size: int = 2000, + on_progress: Optional[Callable[[int, int], None]] = None, + ) -> int: + if not triples: + return 0 + label = self.get_node_table(table_name) + deleted = 0 + cypher = ( + f"UNWIND $rows AS r " + f"MATCH (t:`{label}` {{subject: r.subject, predicate: r.predicate, object: r.object}}) " + f"DETACH DELETE t" + ) + for i in range(0, len(triples), batch_size): + batch = triples[i : i + batch_size] + rows = [ + { + "subject": t.get("subject", ""), + "predicate": t.get("predicate", ""), + "object": t.get("object", ""), + } + for t in batch + ] + self._run(cypher, rows=rows) + deleted += len(batch) + if on_progress: + on_progress(deleted, len(triples)) + logger.info("Deleted %d triples from Neo4j label %s", deleted, label) + return deleted + + def query_triples(self, table_name: str) -> List[Dict[str, str]]: + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) " + f"RETURN t.subject AS subject, t.predicate AS predicate, t.object AS object" + ) + rows = self._run(cypher) + return [ + {"subject": r["subject"], "predicate": r["predicate"], "object": r["object"]} + for r in rows + ] + + def count_triples(self, table_name: str) -> int: + label = self.get_node_table(table_name) + rows = self._run(f"MATCH (t:`{label}`) RETURN count(t) AS cnt") + return int(rows[0]["cnt"]) if rows else 0 + + def table_exists(self, table_name: str) -> bool: + label = self.get_node_table(table_name) + rows = self._run( + "SHOW CONSTRAINTS YIELD name WHERE name = $cname RETURN name", + cname=f"triple_{label}_spo", + ) + return bool(rows) + + def get_status(self, table_name: str) -> Dict[str, Any]: + return { + "count": self.count_triples(table_name), + "last_modified": None, + "path": None, + "format": "neo4j", + } + + def execute_query(self, query: str) -> List[Dict[str, Any]]: + # Deliberate: no raw Cypher entry point. All writes go through + # the build pipeline after ontology validation (C2 safeguard). + raise NotImplementedError( + "Neo4jStore does not expose raw Cypher execution. " + "Use the named query methods on TripleStoreBackend instead." + ) + + def optimize_table(self, table_name: str) -> None: + # Neo4j has no manual VACUUM/OPTIMIZE; the indexer runs online. + return None + + # ====================================================================== + # Named query overrides — native Cypher implementations. + # + # Translated from the SQL defaults on TripleStoreBackend. Flat-triple + # model: every triple is a (: {subject, predicate, object}) + # node. Graph traversal (BFS, transitive closure, neighbours) joins + # Triple nodes by property equality — a typed-relationship graph model + # would be faster but is a future PR (sets supports_graph_model=True). + # ====================================================================== + + # -- Statistics -------------------------------------------------------- + + def get_aggregate_stats(self, table_name: str) -> Dict[str, int]: + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) " + f"RETURN count(t) AS total, " + f"count(DISTINCT t.subject) AS distinct_subjects, " + f"count(DISTINCT t.predicate) AS distinct_predicates, " + f"sum(CASE WHEN t.predicate = $rdf_type THEN 1 ELSE 0 END) AS type_assertion_count, " + f"sum(CASE WHEN t.predicate = $rdfs_label THEN 1 ELSE 0 END) AS label_count" + ) + rows = self._run(cypher, rdf_type=RDF_TYPE, rdfs_label=RDFS_LABEL) + row = rows[0] if rows else {} + return { + "total": int(row.get("total", 0) or 0), + "distinct_subjects": int(row.get("distinct_subjects", 0) or 0), + "distinct_predicates": int(row.get("distinct_predicates", 0) or 0), + "type_assertion_count": int(row.get("type_assertion_count", 0) or 0), + "label_count": int(row.get("label_count", 0) or 0), + } + + def get_type_distribution(self, table_name: str) -> List[Dict[str, Any]]: + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) WHERE t.predicate = $rdf_type " + f"RETURN t.object AS type_uri, count(*) AS cnt " + f"ORDER BY cnt DESC" + ) + return self._run(cypher, rdf_type=RDF_TYPE) or [] + + def get_predicate_distribution(self, table_name: str) -> List[Dict[str, Any]]: + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) " + f"RETURN t.predicate AS predicate, count(*) AS cnt " + f"ORDER BY cnt DESC" + ) + return self._run(cypher) or [] + + # -- Entity lookup ---------------------------------------------------- + + def find_subjects_by_type( + self, + table_name: str, + type_uri: str, + limit: int = 50, + offset: int = 0, + search: Optional[str] = None, + ) -> List[str]: + label = self.get_node_table(table_name) + if search: + cypher = ( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type AND t.object = $type_uri " + f"AND t.subject IN (" + f" MATCH (t2:`{label}`) " + f" WHERE t2.predicate <> $rdf_type AND toLower(t2.object) CONTAINS toLower($search) " + f" RETURN DISTINCT t2.subject" + f") " + f"RETURN DISTINCT t.subject AS subject ORDER BY subject " + f"SKIP $offset LIMIT $limit" + ) + rows = self._run( + cypher, + rdf_type=RDF_TYPE, + type_uri=type_uri, + search=search, + offset=int(offset), + limit=int(limit), + ) + else: + cypher = ( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type AND t.object = $type_uri " + f"RETURN DISTINCT t.subject AS subject ORDER BY subject " + f"SKIP $offset LIMIT $limit" + ) + rows = self._run( + cypher, + rdf_type=RDF_TYPE, + type_uri=type_uri, + offset=int(offset), + limit=int(limit), + ) + return [r["subject"] for r in (rows or [])] + + def resolve_subject_by_id( + self, table_name: str, type_uri: str, id_fragment: str + ) -> Optional[str]: + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type " + f" AND t.object = $type_uri " + f" AND (t.subject ENDS WITH ('/' + $idf) OR t.subject ENDS WITH ('#' + $idf)) " + f"RETURN DISTINCT t.subject AS subject LIMIT 1" + ) + rows = self._run( + cypher, rdf_type=RDF_TYPE, type_uri=type_uri, idf=id_fragment + ) + return rows[0]["subject"] if rows else None + + def get_entity_metadata( + self, table_name: str, subjects: List[str] + ) -> List[Dict[str, str]]: + if not subjects: + return [] + label = self.get_node_table(table_name) + cypher_type = ( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type AND t.subject IN $subjects " + f"RETURN t.subject AS subject, t.object AS object" + ) + cypher_label = ( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdfs_label AND t.subject IN $subjects " + f"RETURN t.subject AS subject, t.object AS object" + ) + type_rows = self._run(cypher_type, rdf_type=RDF_TYPE, subjects=subjects) or [] + label_rows = self._run(cypher_label, rdfs_label=RDFS_LABEL, subjects=subjects) or [] + + types: Dict[str, str] = {} + for r in type_rows: + types.setdefault(r["subject"], r["object"]) + labels: Dict[str, str] = {} + for r in label_rows: + labels.setdefault(r["subject"], r["object"]) + + return [ + {"uri": uri, "type": types.get(uri, ""), "label": labels.get(uri, "")} + for uri in subjects + if uri in types + ] + + def get_triples_for_subjects( + self, table_name: str, subjects: List[str] + ) -> List[Dict[str, str]]: + if not subjects: + return [] + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) WHERE t.subject IN $subjects " + f"RETURN t.subject AS subject, t.predicate AS predicate, t.object AS object" + ) + return self._run(cypher, subjects=subjects) or [] + + def get_predicates_for_type(self, table_name: str, type_uri: str) -> List[str]: + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (anchor:`{label}`) " + f"WHERE anchor.predicate = $rdf_type AND anchor.object = $type_uri " + f"WITH anchor.subject AS s LIMIT 1 " + f"MATCH (t:`{label}`) WHERE t.subject = s " + f"RETURN DISTINCT t.predicate AS predicate" + ) + rows = self._run(cypher, rdf_type=RDF_TYPE, type_uri=type_uri) or [] + return [r["predicate"] for r in rows] + + # -- Pagination ------------------------------------------------------- + + def paginated_triples( + self, + table_name: str, + conditions: List[str], + limit: int, + offset: int, + ) -> List[Dict[str, str]]: + # *conditions* is a list of SQL WHERE fragments produced by the + # caller. Translating arbitrary SQL to Cypher is out of scope — + # only the empty-conditions case (return all triples, paginated) is + # supported in v1. When *conditions* is non-empty we degrade to + # returning the unfiltered page; callers that need filtered + # pagination should switch to find_subjects_by_type / find_seed_subjects. + label = self.get_node_table(table_name) + if conditions: + logger.warning( + "paginated_triples received %d SQL conditions; " + "Neo4j backend ignores them and returns the unfiltered page. " + "Use find_subjects_by_type / find_seed_subjects for filtered access.", + len(conditions), + ) + cypher = ( + f"MATCH (t:`{label}`) " + f"RETURN t.subject AS subject, t.predicate AS predicate, t.object AS object " + f"SKIP $offset LIMIT $limit" + ) + return self._run(cypher, offset=int(offset), limit=int(limit)) or [] + + def paginated_count(self, table_name: str, conditions: List[str]) -> int: + # See paginated_triples — conditions are not honoured in v1. + if conditions: + logger.warning( + "paginated_count received %d SQL conditions; " + "Neo4j backend returns the unfiltered count.", + len(conditions), + ) + return self.count_triples(table_name) + + # -- Traversal -------------------------------------------------------- + + def bfs_traversal( + self, + table_name: str, + seed_where: str, + depth: int, + search: str = "", + entity_type: str = "", + ) -> List[Dict[str, Any]]: + # *seed_where* is a SQL fragment. Cypher equivalent uses the + # structured *search* / *entity_type* parameters instead. When both + # structured params are empty and only seed_where is given, we + # cannot translate — log and return empty. + if not search and not entity_type: + if seed_where: + logger.warning( + "bfs_traversal: Neo4j backend requires structured search/entity_type " + "parameters; SQL seed_where fragments are not translated. " + "Returning empty result." + ) + return [] + seeds = self.find_seed_subjects( + table_name, + entity_type=entity_type, + field="any", + match_type="contains", + value=search, + ) + if not seeds: + return [] + + label = self.get_node_table(table_name) + # Reachability via property-equality joins between Triple nodes: + # a Triple links its subject to its object; we walk over Triple + # nodes hop by hop, accumulating entities (subjects + objects). + cypher = ( + f"WITH $seeds AS seeds " + f"CALL {{ " + f" WITH seeds " + f" UNWIND seeds AS s " + f" RETURN s AS entity, 0 AS lvl " + f" UNION ALL " + f" WITH seeds " + f" MATCH (t:`{label}`) " + f" WHERE t.subject IN seeds " + f" AND t.predicate <> $rdf_type AND t.predicate <> $rdfs_label " + f" AND (t.object STARTS WITH 'http://' OR t.object STARTS WITH 'https://') " + f" RETURN DISTINCT t.object AS entity, 1 AS lvl " + f"}} " + f"WITH entity, min(lvl) AS lvl " + f"WHERE lvl <= $depth " + f"RETURN entity, lvl AS min_lvl" + ) + # The query above only does 1-hop. Full BFS to *depth* > 1 would + # require recursive traversal — Cypher's variable-length pattern + # can do this natively but with the flat-triple model needs a + # joined pattern across Triple nodes. We expand iteratively below + # for arbitrary *depth* while keeping each hop bounded. + if depth <= 1: + return self._run(cypher, seeds=list(seeds), depth=depth, rdf_type=RDF_TYPE, rdfs_label=RDFS_LABEL) or [] + + # Iterative expansion for depth > 1. + visited: Dict[str, int] = {uri: 0 for uri in seeds} + frontier: Set[str] = set(seeds) + for lvl in range(1, depth + 1): + if not frontier: + break + next_frontier = self.expand_entity_neighbors(table_name, frontier) + new_nodes = next_frontier - set(visited.keys()) + for n in new_nodes: + visited[n] = lvl + frontier = new_nodes + return [{"entity": uri, "min_lvl": lvl} for uri, lvl in visited.items()] + + def find_seed_subjects( + self, + table_name: str, + entity_type: str = "", + field: str = "any", + match_type: str = "contains", + value: str = "", + limit: int = 0, + ) -> Set[str]: + label = self.get_node_table(table_name) + search_label = field in ("label", "any") + search_id = field in ("id", "any") + + # Build a Cypher predicate fragment for the chosen match_type. + def _match_clause(column: str, param: str) -> str: + if match_type == "exact": + return f"toLower({column}) = ${param}" + if match_type == "starts": + return f"toLower({column}) STARTS WITH ${param}" + if match_type == "ends": + return f"toLower({column}) ENDS WITH ${param}" + return f"toLower({column}) CONTAINS ${param}" + + params: Dict[str, Any] = { + "rdf_type": RDF_TYPE, + "rdfs_label": RDFS_LABEL, + } + if value: + params["val"] = value.lower() + if entity_type: + params["etype"] = entity_type + + cyphers: List[str] = [] + + if entity_type and value: + if search_id: + cyphers.append( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type AND t.object = $etype " + f"AND {_match_clause('t.subject', 'val')} " + f"RETURN DISTINCT t.subject AS subject" + ) + if search_label: + cyphers.append( + f"MATCH (lab:`{label}`) " + f"WHERE lab.predicate = $rdfs_label " + f"AND {_match_clause('lab.object', 'val')} " + f"WITH DISTINCT lab.subject AS s " + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type AND t.object = $etype AND t.subject = s " + f"RETURN DISTINCT s AS subject" + ) + elif entity_type: + cyphers.append( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type AND t.object = $etype " + f"RETURN DISTINCT t.subject AS subject" + ) + elif value: + if search_label: + cyphers.append( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdfs_label " + f"AND {_match_clause('t.object', 'val')} " + f"RETURN DISTINCT t.subject AS subject" + ) + if search_id: + cyphers.append( + f"MATCH (t:`{label}`) " + f"WHERE t.predicate = $rdf_type " + f"AND {_match_clause('t.subject', 'val')} " + f"RETURN DISTINCT t.subject AS subject" + ) + else: + return set() + + if not cyphers: + return set() + + union_sql = " UNION ".join(cyphers) + if limit and limit > 0: + union_sql = f"CALL {{ {union_sql} }} RETURN subject LIMIT {int(limit)}" + rows = self._run(union_sql, **params) or [] + return {r["subject"] for r in rows} + + def find_subjects_by_patterns( + self, table_name: str, like_patterns: List[str] + ) -> Set[str]: + if not like_patterns: + return set() + label = self.get_node_table(table_name) + + # SQL LIKE → Cypher: '%' wildcards translate to STARTS WITH / ENDS WITH + # / CONTAINS depending on placement. For arbitrary patterns we fall + # back to a regex match. + clauses: List[str] = [] + params: Dict[str, Any] = {} + for i, raw in enumerate(like_patterns): + pkey = f"p{i}" + params[pkey] = raw.replace("%", ".*") + clauses.append(f"t.subject =~ ${pkey}") + cypher = ( + f"MATCH (t:`{label}`) WHERE {' OR '.join(clauses)} " + f"RETURN DISTINCT t.subject AS subject" + ) + rows = self._run(cypher, **params) or [] + return {r["subject"] for r in rows} + + def expand_entity_neighbors( + self, table_name: str, entity_uris: Set[str] + ) -> Set[str]: + if not entity_uris: + return set() + label = self.get_node_table(table_name) + # Outgoing edges: where subject IN seeds AND object looks like an entity URI. + # Incoming edges: where object IN seeds. + # Both then filtered to entities that have an rdf:type assertion + # (real entity instances, not class or property URIs). + cypher = ( + f"WITH $seeds AS seeds " + f"MATCH (t:`{label}`) " + f"WHERE (t.subject IN seeds AND t.object STARTS WITH 'http' " + f" AND t.predicate <> $rdf_type AND t.predicate <> $rdfs_label) " + f" OR (t.object IN seeds AND t.predicate <> $rdf_type AND t.predicate <> $rdfs_label) " + f"WITH DISTINCT (CASE WHEN t.subject IN seeds THEN t.object ELSE t.subject END) AS entity " + f"MATCH (ty:`{label}`) " + f"WHERE ty.subject = entity AND ty.predicate = $rdf_type " + f"RETURN DISTINCT entity" + ) + rows = self._run( + cypher, + seeds=list(entity_uris), + rdf_type=RDF_TYPE, + rdfs_label=RDFS_LABEL, + ) or [] + return {r["entity"] for r in rows} + + # -- Reasoning -------------------------------------------------------- + + def transitive_closure( + self, + table_name: str, + predicate_uri: str, + start_uri: Optional[str] = None, + max_depth: int = 20, + ) -> List[Dict[str, Any]]: + # Compute transitive closure along *predicate_uri* and return triples + # NOT already present as direct assertions. With the flat-triple model + # we self-join Triple nodes hop by hop using property equality. + label = self.get_node_table(table_name) + start_filter = "" + params: Dict[str, Any] = {"pred": predicate_uri, "max_depth": int(max_depth)} + if start_uri: + start_filter = "AND start.subject = $start_uri " + params["start_uri"] = start_uri + + # Build a chain of MATCH clauses up to max_depth. This is verbose but + # explicit; Cypher does not have recursive CTEs. + depth = min(max_depth, 20) # hard cap for safety + union_parts: List[str] = [] + # depth=2 means start -> mid -> end (2 hops) + for d in range(2, depth + 1): + chain = "MATCH (h0:`" + label + "`)" + wheres = ["h0.predicate = $pred"] + if start_uri: + wheres.append("h0.subject = $start_uri") + for i in range(1, d): + chain += f", (h{i}:`{label}`)" + wheres.append(f"h{i}.predicate = $pred") + wheres.append(f"h{i-1}.object = h{i}.subject") + union_parts.append( + chain + " WHERE " + " AND ".join(wheres) + + f" RETURN h0.subject AS subject, h{d-1}.object AS object" + ) + + if not union_parts: + return [] + + body = " UNION ".join(union_parts) + cypher = ( + f"CALL {{ {body} }} " + f"WITH DISTINCT subject, object " + f"WHERE NOT EXISTS {{ " + f" MATCH (ex:`{label}`) " + f" WHERE ex.subject = subject AND ex.predicate = $pred AND ex.object = object " + f"}} " + f"RETURN subject, $pred AS predicate, object" + ) + try: + return self._run(cypher, **params) or [] + except Exception as exc: # noqa: BLE001 + logger.warning( + "transitive_closure failed on %s (predicate=%s): %s", + table_name, + predicate_uri, + exc, + ) + return [] + + def symmetric_expand( + self, table_name: str, predicate_uri: str + ) -> List[Dict[str, Any]]: + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) WHERE t.predicate = $pred " + f"AND NOT EXISTS {{ " + f" MATCH (inv:`{label}`) " + f" WHERE inv.subject = t.object AND inv.predicate = $pred AND inv.object = t.subject " + f"}} " + f"RETURN t.object AS subject, $pred AS predicate, t.subject AS object" + ) + try: + return self._run(cypher, pred=predicate_uri) or [] + except Exception as exc: # noqa: BLE001 + logger.warning( + "symmetric_expand failed on %s (predicate=%s): %s", + table_name, + predicate_uri, + exc, + ) + return [] + + def shortest_path( + self, + table_name: str, + source_uri: str, + target_uri: str, + max_depth: int = 10, + ) -> List[Dict[str, Any]]: + # Native Cypher shortestPath would be ideal but requires a typed- + # relationship graph model. With the flat-triple model we do a + # bounded iterative BFS and return the first path found. + if source_uri == target_uri: + return [{"hop": 0, "uri": source_uri}] + + visited: Set[str] = {source_uri} + parent: Dict[str, str] = {} + frontier: Set[str] = {source_uri} + for depth in range(1, min(max_depth, 10) + 1): + next_frontier = self.expand_entity_neighbors(table_name, frontier) + for n in next_frontier: + if n in visited: + continue + for prev in frontier: + parent.setdefault(n, prev) + visited |= next_frontier + if target_uri in next_frontier: + # Reconstruct the path. + path_uris: List[str] = [target_uri] + cur = target_uri + while cur in parent and cur != source_uri: + cur = parent[cur] + path_uris.append(cur) + path_uris.reverse() + return [{"hop": i, "uri": uri} for i, uri in enumerate(path_uris)] + frontier = next_frontier - visited + if not frontier: + break + return [] + + # -- Cohorts ---------------------------------------------------------- + + def delete_cohort_triples( + self, + table_name: str, + cohort_uri_prefix: str, + in_cohort_predicate: str, + ) -> int: + if not cohort_uri_prefix: + return 0 + label = self.get_node_table(table_name) + cypher = ( + f"MATCH (t:`{label}`) " + f"WHERE t.subject STARTS WITH $prefix " + f" OR (t.predicate = $in_pred AND t.object STARTS WITH $prefix) " + f"WITH t LIMIT 100000 " + f"DETACH DELETE t " + f"RETURN count(t) AS deleted" + ) + try: + rows = self._run( + cypher, prefix=cohort_uri_prefix, in_pred=in_cohort_predicate + ) + return int(rows[0].get("deleted", 0)) if rows else 0 + except Exception as exc: # noqa: BLE001 + logger.warning( + "delete_cohort_triples failed on %s (%s): %s", + table_name, + cohort_uri_prefix, + exc, + ) + return 0 diff --git a/src/back/core/graphdb/neo4j/__init__.py b/src/back/core/graphdb/neo4j/__init__.py new file mode 100644 index 00000000..09cd3cc9 --- /dev/null +++ b/src/back/core/graphdb/neo4j/__init__.py @@ -0,0 +1,23 @@ +"""Neo4j graph database backend. + +Cypher-based, remote-only (Neo4j Aura / self-hosted Neo4j). Bolt protocol +via the official ``neo4j`` Python driver. See ``Neo4jStore`` for the +implementation contract. + +PR 1 ships the skeleton + flat-triple CRUD. Named-query Cypher overrides +(transitive closure, BFS, type distribution, …) land in PR 2 alongside +``SWRLFlatCypherTranslator`` for reasoning. +""" + +try: + import neo4j as _neo4j # noqa: F401 + NEO4J_AVAILABLE = True +except ImportError: + NEO4J_AVAILABLE = False + +if NEO4J_AVAILABLE: + from back.core.graphdb.neo4j.Neo4jStore import Neo4jStore # noqa: F401 + + __all__ = ["Neo4jStore", "NEO4J_AVAILABLE"] +else: + __all__ = ["NEO4J_AVAILABLE"] diff --git a/src/back/core/reasoning/SWRLFlatCypherTranslator.py b/src/back/core/reasoning/SWRLFlatCypherTranslator.py new file mode 100644 index 00000000..14ca5114 --- /dev/null +++ b/src/back/core/reasoning/SWRLFlatCypherTranslator.py @@ -0,0 +1,125 @@ +"""Translate SWRL rules to Cypher for the flat-triple Neo4j model. + +**STATUS: scaffolding only.** This class exists so the architecture is in +place (``GraphDBBackend.get_query_translator`` returns this for Cypher +flat-model engines) and so the reasoning UI does not crash when Neo4j is +the active engine. **Actual SWRL → Cypher translation is not implemented +yet** — every method here returns ``None`` and logs a clear warning. + +Why this is scaffolded rather than fully implemented: + +- The SQL counterpart (:class:`SWRLSQLTranslator`) is ~730 lines of + careful logic for builtins, negation, variable bindings, arity-1 vs + arity-2 atoms, IRI resolution, and antecedent-vs-consequent assembly. + A faithful Cypher port is its own piece of work, deserving a dedicated + PR with its own test suite — not bundled into the engine-skeleton PR. +- Falling back to ``None`` is what the reasoning engine treats as + "no work to do", so the UI surfaces "0 violations / 0 inferences" + cleanly rather than crashing. + +When the dedicated PR lands it should mirror the public interface +below — same method names, same return types — so callers do not change. +""" + +from typing import Any, Dict, Optional + +from back.core.logging import get_logger + +logger = get_logger(__name__) + + +class SWRLFlatCypherTranslator: + """Cypher counterpart of :class:`SWRLSQLTranslator` — scaffolded. + + Parameters + ---------- + node_label: + Neo4j label suffix used for the per-store triple nodes, e.g. + ``""``. The full triple pattern is + ``(:Triple:{node_label} {subject, predicate, object})``. + """ + + def __init__(self, node_label: str = "") -> None: + self.node_label = node_label + + # ------------------------------------------------------------------ + # Public interface — mirrors SWRLSQLTranslator. + # All methods return None for now (graceful no-op). + # ------------------------------------------------------------------ + + def build_violation_cypher( + self, table: str, params: Dict[str, Any] + ) -> Optional[str]: + """Build Cypher that finds subjects violating a SWRL rule. + + Returns ``None`` — Cypher SWRL violation queries are not + translated in this version. Reasoning on Neo4j will report + zero violations until the dedicated translator PR lands. + """ + logger.warning( + "SWRLFlatCypherTranslator.build_violation_cypher: " + "SWRL→Cypher translation is not implemented yet. " + "Returning None (rule produces no violations on Neo4j)." + ) + return None + + def build_antecedent_count_cypher( + self, table: str, params: Dict[str, Any] + ) -> Optional[str]: + """Cypher that counts how often a SWRL antecedent matches. + + Returns ``None`` — see class docstring. + """ + logger.warning( + "SWRLFlatCypherTranslator.build_antecedent_count_cypher: " + "not implemented yet. Returning None." + ) + return None + + def build_materialization_cypher( + self, table: str, params: Dict[str, Any] + ) -> Optional[str]: + """Cypher that materialises inferred triples produced by a rule. + + Returns ``None`` — see class docstring. + """ + logger.warning( + "SWRLFlatCypherTranslator.build_materialization_cypher: " + "not implemented yet. Returning None (no inferences materialised)." + ) + return None + + def build_inference_cypher( + self, table: str, params: Dict[str, Any] + ) -> Optional[str]: + """Alias / variant of :meth:`build_materialization_cypher`. + + Returns ``None`` — see class docstring. + """ + logger.warning( + "SWRLFlatCypherTranslator.build_inference_cypher: " + "not implemented yet. Returning None." + ) + return None + + # ------------------------------------------------------------------ + # Compatibility shims — the reasoning engine calls the SQL names. + # Forward them to the Cypher methods so the engine can use either + # translator without branching. + # ------------------------------------------------------------------ + + def build_violation_sql(self, table: str, params: Dict[str, Any]) -> Optional[str]: + return self.build_violation_cypher(table, params) + + def build_antecedent_count_sql( + self, table: str, params: Dict[str, Any] + ) -> Optional[str]: + return self.build_antecedent_count_cypher(table, params) + + def build_materialization_sql( + self, table: str, params: Dict[str, Any] + ) -> Optional[str]: + return self.build_materialization_cypher(table, params) + + def build_inference_sql(self, table: str, params: Dict[str, Any]) -> Optional[str]: + return self.build_inference_cypher(table, params) diff --git a/src/back/objects/session/GlobalConfigService.py b/src/back/objects/session/GlobalConfigService.py index 7e53ef15..8656afbf 100644 --- a/src/back/objects/session/GlobalConfigService.py +++ b/src/back/objects/session/GlobalConfigService.py @@ -263,7 +263,7 @@ def set_navbar_logo( """Persist the navbar logo as a ``data:`` URL (empty string clears it).""" return self._save(host, token, registry_cfg, {"navbar_logo": data_url or ""}) - ALLOWED_GRAPH_ENGINES = ("lakebase",) + ALLOWED_GRAPH_ENGINES = ("lakebase", "neo4j") def get_graph_engine( self, host: str, token: str, registry_cfg: Dict[str, str] diff --git a/src/front/config/menu_config.json b/src/front/config/menu_config.json index e8062bb2..b4ac2f43 100644 --- a/src/front/config/menu_config.json +++ b/src/front/config/menu_config.json @@ -519,6 +519,13 @@ "icon": "bi-diagram-3", "default": false, "requires": null + }, + { + "id": "neo4j", + "label": "Neo4j", + "icon": "bi-bezier2", + "default": false, + "requires": null } ] }, diff --git a/src/front/fastapi/dependencies.py b/src/front/fastapi/dependencies.py index 2bb4c50a..c17bcaa3 100644 --- a/src/front/fastapi/dependencies.py +++ b/src/front/fastapi/dependencies.py @@ -156,13 +156,15 @@ def triplestore_page_context(domain_session, settings=None) -> dict: """Build the triplestore-related template context shared by dtwin and domain pages. Returns dict with ``view_table``, ``graph_name``, ``triplestore_cache``, and - ``graph_engine`` (currently always ``lakebase``). + ``graph_engine``. The engine is resolved through + :pymeth:`TripleStoreFactory._resolve_graph_engine`, which walks the + domain-override → global-setting → ``"lakebase"`` chain. ``GlobalConfigService`` + has already validated the value against ``ALLOWED_GRAPH_ENGINES``. """ from back.core.helpers import effective_view_table, effective_graph_name from back.core.triplestore.TripleStoreFactory import TripleStoreFactory - _raw = TripleStoreFactory._resolve_graph_engine(domain_session, settings) or "lakebase" - graph_engine = _raw if _raw == "lakebase" else "lakebase" + graph_engine = TripleStoreFactory._resolve_graph_engine(domain_session, settings) or "lakebase" return { "view_table": effective_view_table(domain_session), diff --git a/src/front/static/config/js/settings.js b/src/front/static/config/js/settings.js index 48b1d43b..bc5e72db 100644 --- a/src/front/static/config/js/settings.js +++ b/src/front/static/config/js/settings.js @@ -1943,6 +1943,8 @@ document.addEventListener('DOMContentLoaded', function () { if (sel.value === 'lakebase') { mergeLakebasePanelIntoConfigTextarea(); + } else if (sel.value === 'neo4j') { + mergeNeo4jPanelIntoConfigTextarea(); } let parsed; @@ -2005,6 +2007,87 @@ document.addEventListener('DOMContentLoaded', function () { // GLOBAL SAVE BUTTON – warehouse, global prefs, CloudFetch, Graph DB // ===================================================================== + // ── Neo4j engine config — form ↔ textarea ─────────────────────────────── + // + // Mirrors the Lakebase merge/toggle pattern. When the active engine is + // "neo4j" (Settings > Triple store > Global dropdown), this function + // reads the Neo4j config form fields from #neo4j-section and serialises + // them into the shared #graphEngineConfig textarea, which the existing + // save flow then POSTs to /settings/graph-engine-config. + function mergeNeo4jPanelIntoConfigTextarea() { + const ta = document.getElementById('graphEngineConfig'); + if (!ta) return; + let o = {}; + try { o = JSON.parse(ta.value || '{}'); } catch (_) { o = {}; } + if (typeof o !== 'object' || Array.isArray(o)) o = {}; + + const uri = (document.getElementById('neo4jUri')?.value || '').trim(); + const database = (document.getElementById('neo4jDatabase')?.value || '').trim(); + const authMethod = (document.getElementById('neo4jAuthMethod')?.value || 'basic').trim(); + const encrypted = !!document.getElementById('neo4jEncrypted')?.checked; + + if (uri) o.uri = uri; else delete o.uri; + o.database = database || 'neo4j'; + o.auth_method = authMethod; + o.encrypted = encrypted; + + if (authMethod === 'basic') { + const user = (document.getElementById('neo4jUsername')?.value || '').trim(); + const pwd = (document.getElementById('neo4jPassword')?.value || ''); + if (user) o.username = user; else delete o.username; + if (pwd) o.password = pwd; else delete o.password; + delete o.secret_scope; + delete o.secret_key; + } else if (authMethod === 'databricks_secret') { + const scope = (document.getElementById('neo4jSecretScope')?.value || '').trim(); + const key = (document.getElementById('neo4jSecretKey')?.value || '').trim(); + if (scope) o.secret_scope = scope; else delete o.secret_scope; + if (key) o.secret_key = key; else delete o.secret_key; + delete o.username; + delete o.password; + } + ta.value = JSON.stringify(o, null, 2); + } + + // Auth-method visibility toggle + function applyNeo4jAuthMethodVisibility() { + const sel = document.getElementById('neo4jAuthMethod'); + if (!sel) return; + const basicFields = document.querySelectorAll('.neo4j-auth-basic'); + const secretFields = document.querySelectorAll('.neo4j-auth-databricks-secret'); + const isBasic = sel.value === 'basic'; + const isSecret = sel.value === 'databricks_secret'; + basicFields.forEach(el => el.classList.toggle('d-none', !isBasic)); + secretFields.forEach(el => el.classList.toggle('d-none', !isSecret)); + } + + // Wire up Neo4j form field listeners — keep the textarea in sync as the + // user edits the panel, so the save flow always serialises fresh values. + [ + 'neo4jUri', 'neo4jDatabase', 'neo4jAuthMethod', + 'neo4jUsername', 'neo4jPassword', + 'neo4jSecretScope', 'neo4jSecretKey', + 'neo4jEncrypted', + ].forEach(id => { + const el = document.getElementById(id); + if (!el) return; + el.addEventListener('input', mergeNeo4jPanelIntoConfigTextarea); + el.addEventListener('change', mergeNeo4jPanelIntoConfigTextarea); + }); + document.getElementById('neo4jAuthMethod')?.addEventListener('change', applyNeo4jAuthMethodVisibility); + // Initial render — apply auth-method visibility on page load. + applyNeo4jAuthMethodVisibility(); + + // Test-connection button: deferred to a follow-up commit; surface a + // friendly message for now so users don't think the button is broken. + document.getElementById('btnTestNeo4jConnection')?.addEventListener('click', () => { + const result = document.getElementById('neo4jTestResult'); + if (!result) return; + result.className = 'alert alert-info mt-3 small'; + result.classList.remove('d-none'); + result.textContent = 'Test-connection is not wired up yet. Save the config and run a build to verify the connection.'; + }); + document.querySelectorAll('.btn-save-settings').forEach(saveBtn => saveBtn.addEventListener('click', async function () { const btn = this; btn.disabled = true; diff --git a/src/front/static/domain/js/domain-validation.js b/src/front/static/domain/js/domain-validation.js index 2c3e549b..cc13249f 100644 --- a/src/front/static/domain/js/domain-validation.js +++ b/src/front/static/domain/js/domain-validation.js @@ -450,10 +450,46 @@ function updateDtwinCard(data) { } } - // Graph DB card — Lakebase details + // Graph DB card — render engine-aware title and architecture. + // + // `dt.graph_engine` is the engine recorded on the domain at build time and + // can be stale relative to the active global engine. Reconcile unconditionally + // against `/settings/graph-engine`. var eng = dt.graph_engine || 'lakebase'; - var titleGraph = document.getElementById('psDtGraphBackendTitle'); - if (titleGraph) titleGraph.textContent = 'Graph DB (Lakebase)'; + var engineLabels = { + 'lakebase': 'Graph DB (Lakebase)', + 'neo4j': 'Graph DB (Neo4j)' + }; + + function _psRenderEngineUi(activeEng) { + var container = document.getElementById('psDtLakebaseDetails'); + var titleEl = document.getElementById('psDtGraphBackendTitle'); + var lkIcon = document.querySelector('#psDtGraphCard .dt-arch-icon-lakebase-img'); + var syncRow = document.getElementById('psDtLakebaseSyncedUcRow'); + var boltRow = document.getElementById('psDtNeo4jBoltCard'); + var graphFn = document.getElementById('psDtLakebaseFullName'); + if (container) container.classList.remove('d-none'); + if (titleEl) titleEl.textContent = engineLabels[activeEng] || ('Graph DB (' + activeEng + ')'); + if (activeEng === 'neo4j') { + if (syncRow) syncRow.classList.add('d-none'); + if (boltRow) boltRow.classList.remove('d-none'); + if (lkIcon) lkIcon.classList.add('d-none'); + if (graphFn) graphFn.textContent = (dt.graph_name || 'Knowledge Graph'); + } else { + if (syncRow) syncRow.classList.remove('d-none'); + if (boltRow) boltRow.classList.add('d-none'); + if (lkIcon) lkIcon.classList.remove('d-none'); + } + } + _psRenderEngineUi(eng); + fetch('/settings/graph-engine', { credentials: 'same-origin' }) + .then(function (r) { return r.ok ? r.json() : null; }) + .then(function (data) { + var globalEng = data && data.graph_engine; + if (!globalEng || globalEng === eng) return; + _psRenderEngineUi(globalEng); + }) + .catch(function () { /* leave fallback in place */ }); var graphCard = document.getElementById('psDtGraphCard'); if (graphCard) { @@ -462,9 +498,6 @@ function updateDtwinCard(data) { else if (dt.lakebase_table_exists === false) graphCard.classList.add('border-danger'); } - var lkDetails = document.getElementById('psDtLakebaseDetails'); - if (lkDetails) lkDetails.classList.toggle('d-none', eng !== 'lakebase'); - if (eng === 'lakebase') { var psDb = document.getElementById('psDtLakebaseDatabase'); var psSch = document.getElementById('psDtLakebaseSchema'); diff --git a/src/front/static/query/js/query-sync.js b/src/front/static/query/js/query-sync.js index e68da3a6..78d6c26b 100644 --- a/src/front/static/query/js/query-sync.js +++ b/src/front/static/query/js/query-sync.js @@ -152,17 +152,59 @@ function _applyBuildGraphEngineUi(dtExist) { if (fnLk) fnLk.classList.remove('d-none'); var title = document.getElementById('dtGraphBackendTitle'); + var labels = { 'lakebase': 'Graph DB (Lakebase)', 'neo4j': 'Graph DB (Neo4j)' }; if (title) { - title.textContent = eng === 'lakebase' ? 'Graph DB (Lakebase)' : 'Graph DB Digital Twin'; + title.textContent = labels[eng] || 'Graph DB Digital Twin'; } + // Toggle the post-Triple-Store cards (Sync + Graph DB) based on engine. + // The `dtLakebaseDetails` container wraps both the Sync card and the + // Graph DB card. On Lakebase, both show. On Neo4j, the Sync card is + // hidden (no UC-synced table on the Neo4j path) but the Graph DB card + // remains visible with engine-aware label and metadata. + function _renderEngineUi(activeEng) { + var container = document.getElementById('dtLakebaseDetails'); + var titleEl = document.getElementById('dtGraphBackendTitle'); + var lkIcon = document.querySelector('#dtGraphCard .dt-arch-icon-lakebase-img'); + var syncRow = document.getElementById('dtLakebaseSyncedUcRow'); + var boltRow = document.getElementById('dtNeo4jBoltCard'); + var lkBuild = document.getElementById('dtLakebaseBuildNote'); + var graphFn = document.getElementById('dtLakebaseFullName'); + if (container) container.classList.remove('d-none'); + if (titleEl) titleEl.textContent = labels[activeEng] || ('Graph DB (' + activeEng + ')'); + if (activeEng === 'neo4j') { + // Show the Bolt writer card, hide the Lakebase Sync card + build note + icon + if (syncRow) syncRow.classList.add('d-none'); + if (boltRow) boltRow.classList.remove('d-none'); + if (lkBuild) lkBuild.classList.add('d-none'); + if (lkIcon) lkIcon.classList.add('d-none'); + if (graphFn) graphFn.textContent = (cfg.graph_name || 'Knowledge Graph'); + } else { + if (syncRow) syncRow.classList.remove('d-none'); + if (boltRow) boltRow.classList.add('d-none'); + if (lkBuild) lkBuild.classList.remove('d-none'); + if (lkIcon) lkIcon.classList.remove('d-none'); + } + } + _renderEngineUi(eng); + // `dt.graph_engine` can be stale even after a build: it reflects the + // engine recorded on the domain at build-time, not necessarily the + // active global engine. Reconcile against /settings/graph-engine. + fetch('/settings/graph-engine', { credentials: 'same-origin' }) + .then(function (r) { return r.ok ? r.json() : null; }) + .then(function (data) { + var globalEng = data && data.graph_engine; + if (!globalEng || globalEng === cfg.graph_engine) return; + cfg.graph_engine = globalEng; + window.__TRIPLESTORE_CONFIG = cfg; + _renderEngineUi(globalEng); + }) + .catch(function () { /* leave fallback in place */ }); var sub = document.getElementById('dtGraphStorageSubtitle'); var primaryRow = document.getElementById('dtGraphPrimaryRow'); if (sub) sub.classList.add('d-none'); if (primaryRow) primaryRow.classList.add('d-none'); var regRow = document.getElementById('dtRegistryArchiveRow'); if (regRow) regRow.classList.add('d-none'); - var lkDetails = document.getElementById('dtLakebaseDetails'); - if (lkDetails) lkDetails.classList.toggle('d-none', eng !== 'lakebase'); if (eng === 'lakebase') { var lkDb = document.getElementById('dtLakebaseDatabase'); diff --git a/src/front/templates/partials/domain/_domain_validation.html b/src/front/templates/partials/domain/_domain_validation.html index b22ce2de..1a4316d3 100644 --- a/src/front/templates/partials/domain/_domain_validation.html +++ b/src/front/templates/partials/domain/_domain_validation.html @@ -241,15 +241,17 @@

Cockpit

- +
- +
- +
@@ -259,12 +261,22 @@

Cockpit

- + +
+
+ + Bolt + UNWIND · MERGE +
+ Cypher write at build time +
+ +
- +
Lakebase diff --git a/src/front/templates/partials/dtwin/_query_sync.html b/src/front/templates/partials/dtwin/_query_sync.html index 1be1249e..974c0ffc 100644 --- a/src/front/templates/partials/dtwin/_query_sync.html +++ b/src/front/templates/partials/dtwin/_query_sync.html @@ -144,15 +144,17 @@
Readines N/A
- +
- +
- +
@@ -162,12 +164,22 @@
Readines
- + +
+
+ + Bolt + UNWIND · MERGE +
+ Cypher write at build time +
+ +
- +
Lakebase diff --git a/src/front/templates/settings.html b/src/front/templates/settings.html index e60dccde..7523e168 100644 --- a/src/front/templates/settings.html +++ b/src/front/templates/settings.html @@ -104,6 +104,7 @@

Triple store — Global

Selects the graph database engine used for building and querying knowledge graphs. @@ -522,6 +523,99 @@
+ + +