Skip to content

forward-port graphiti_core improvements from internal fork#1422

Merged
prasmussen15 merged 2 commits intomainfrom
preston/forward-port-from-zep-proprietary
Apr 18, 2026
Merged

forward-port graphiti_core improvements from internal fork#1422
prasmussen15 merged 2 commits intomainfrom
preston/forward-port-from-zep-proprietary

Conversation

@prasmussen15
Copy link
Copy Markdown
Collaborator

Summary

Forward-ports a set of graphiti_core improvements that have been living in an internal fork back to this repo. No workflow / CONTRIBUTING / CLA changes are included — those have moved forward here and we don't want to regress them.

Multi-episode batched extraction

Graphiti._extract_and_resolve_nodes and extract_edges now accept a list of episodes instead of a single one. When multiple are passed:

  • Content is concatenated with [Episode N] headers via the new concatenate_episodes helper in graphiti_core/utils/text_utils.py.
  • Each extracted node and edge carries episode_indices: list[int] so callers can map results back to the originating episode.
  • Prompts (extract_nodes.extract_message, extract_edges.edge) and Pydantic models updated accordingly.
  • build_episodic_edges (in edge_operations.py) accepts a single or multi episode UUID and a node_episode_index_map to wire up MENTIONED_IN edges per episode.

Files: graphiti_core/graphiti.py, graphiti_core/utils/text_utils.py, graphiti_core/utils/maintenance/{edge,node}_operations.py, graphiti_core/utils/bulk_utils.py, graphiti_core/prompts/{extract_edges,extract_nodes}.py, tests/utils/test_concatenate_episodes.py, tests/utils/maintenance/{test_bulk_utils,test_entity_extraction}.py, tests/{helpers_test,test_graphiti_mock}.py.

fact_triple episode type and episode_metadata

  • New EpisodeType.fact_triple variant in graphiti_core/nodes.py.
  • New episode_metadata: dict[str, Any] | None field on EpisodicNode for customer-defined filtering keys.

Safer attribute merging

When persisting an edge or node, the attributes dict was previously merged into edge_data / entity_data with dict.update(...), which would silently overwrite first-class fields (uuid, group_id, created_at, timestamps, etc.) if an attribute dict happened to contain a matching key. Now merged with setdefault semantics — existing typed fields win. Files: graphiti_core/edges.py, graphiti_core/utils/bulk_utils.py.

reference_time preservation in bulk edge save

reference_time is now passed through add_nodes_and_edges_bulk_tx alongside the other edge timestamps. File: graphiti_core/utils/bulk_utils.py.

Fact-extraction prompt refinements

Based on evaluation findings that concrete details (brand names, quantities, colors, qualified objects) were being lost during extraction:

  • extract_edges: relaxed strict two-entity rule (look for a list-anchor before dropping), strengthened detail-preservation rule, clarified intra-CURRENT_MESSAGE dedup scope.
  • extract_nodes: softened the "Wikipedia article" specificity test so brand-named and descriptor-qualified items ("Gamecube", "wool coat", "cracked windshield") are extracted while bare head nouns ("coat", "windshield") are not.

Files: graphiti_core/prompts/{extract_edges,extract_nodes}.py.

summarize_sagas prompt refresh

Updated prompt for the saga-summary path. File: graphiti_core/prompts/summarize_sagas.py.

Minor driver tweaks

graphiti_core/driver/graph_operations/graph_operations.py, graphiti_core/driver/record_parsers.py.

pytest bump

server/pyproject.toml and mcp_server/pyproject.toml bumped pytest>=9.0.3 with matching uv.lock refreshes.

Test plan

  • make check (format + lint + tests) passes locally
  • Existing unit tests pass (multi-episode code paths are exercised in the new test_concatenate_episodes.py and the updated test_entity_extraction.py / test_bulk_utils.py / test_graphiti_mock.py)
  • Integration tests (pytest -k _int) — run locally with Neo4j before merge
  • Reviewer: sanity check the episode_indices / node_episode_index_map plumbing in edge_operations.py and node_operations.py
  • Reviewer: confirm the attribute-merging change doesn't break any custom edge/entity types that rely on the previous overwrite-on-collision behavior

🤖 Generated with Claude Code

This change brings a set of graphiti_core improvements that have been
living in an internal fork back to the OSS repo:

- Multi-episode batched extraction. `Graphiti._extract_and_resolve_nodes`
  and `extract_edges` now accept a list of episodes, build a combined
  prompt with per-episode `[Episode N]` headers (new
  `concatenate_episodes` helper in `text_utils.py`), and attach
  `episode_indices` to each extracted node/edge so the caller can map
  results back to the originating episode. Supporting changes in
  `edge_operations.py` / `node_operations.py` / `bulk_utils.py` and
  corresponding updates to the extract_nodes / extract_edges prompts.
- New `fact_triple` episode type and an `episode_metadata` dict field on
  `EpisodicNode` for customer-defined filtering keys (`nodes.py`).
- Safer attribute merging on edges (`edges.py`) and nodes
  (`bulk_utils.py`) — do not overwrite existing first-class fields
  (`uuid`, `group_id`, `created_at`, timestamps, etc.) when an attribute
  dict happens to contain a matching key.
- Preserve `reference_time` through bulk edge save (`bulk_utils.py`).
- Fact-extraction prompt refinements: detail preservation rule, wider
  entity extraction for qualified objects, clarified dedup scope
  (`prompts/extract_edges.py`, `prompts/extract_nodes.py`).
- `summarize_sagas.py` prompt refresh.
- Minor driver tweaks (`graph_operations.py`, `record_parsers.py`).
- Tests for the new concatenate helper, multi-episode behavior, and
  prompt changes.
- `pytest` bumped to 9.0.3 in `server/pyproject.toml` and
  `mcp_server/pyproject.toml`, with matching lockfile refreshes.

Workflows, CONTRIBUTING.md, and the CLA manifest are intentionally not
included — those have moved forward upstream and we don't want to
regress them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The entity edge save queries in `edge_db_queries.py` (both single and
bulk paths) set `e.reference_time = $reference_time` on Kuzu, but the
`RelatesToNode_` node-table DDL did not declare a `reference_time`
column, so the binder raised `Cannot find property reference_time for e`
as soon as any Kuzu edge save ran.

This failure did not surface on main because `helpers_test.py` on main
did not include Kuzu in the driver fixture list — the Kuzu
parametrization of `test_graphiti_mock.py` / `test_edge_int.py` was
effectively empty. The previous commit in this PR added Kuzu to the
fixture list, which is what exposed the latent bug.

Fix: add `reference_time TIMESTAMP` alongside the other temporal
columns (`valid_at`, `invalid_at`, `expired_at`) so the column matches
what the save queries already expect.

Note for existing Kuzu users: `CREATE NODE TABLE IF NOT EXISTS` will
not add the column to an existing database. A one-time
`ALTER TABLE RelatesToNode_ ADD reference_time TIMESTAMP` may be
required if upgrading against an existing store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prasmussen15 prasmussen15 merged commit 3c9547c into main Apr 18, 2026
10 of 12 checks passed
@prasmussen15 prasmussen15 deleted the preston/forward-port-from-zep-proprietary branch April 18, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant