forward-port graphiti_core improvements from internal fork by prasmussen15 · Pull Request #1422 · getzep/graphiti

prasmussen15 · 2026-04-18T04:06:42Z

Summary

Forward-ports a set of graphiti_core improvements that have been living in an internal fork back to this repo. No workflow / CONTRIBUTING / CLA changes are included — those have moved forward here and we don't want to regress them.

Multi-episode batched extraction

Graphiti._extract_and_resolve_nodes and extract_edges now accept a list of episodes instead of a single one. When multiple are passed:

Content is concatenated with [Episode N] headers via the new concatenate_episodes helper in graphiti_core/utils/text_utils.py.
Each extracted node and edge carries episode_indices: list[int] so callers can map results back to the originating episode.
Prompts (extract_nodes.extract_message, extract_edges.edge) and Pydantic models updated accordingly.
build_episodic_edges (in edge_operations.py) accepts a single or multi episode UUID and a node_episode_index_map to wire up MENTIONED_IN edges per episode.

Files: graphiti_core/graphiti.py, graphiti_core/utils/text_utils.py, graphiti_core/utils/maintenance/{edge,node}_operations.py, graphiti_core/utils/bulk_utils.py, graphiti_core/prompts/{extract_edges,extract_nodes}.py, tests/utils/test_concatenate_episodes.py, tests/utils/maintenance/{test_bulk_utils,test_entity_extraction}.py, tests/{helpers_test,test_graphiti_mock}.py.

`fact_triple` episode type and `episode_metadata`

New EpisodeType.fact_triple variant in graphiti_core/nodes.py.
New episode_metadata: dict[str, Any] | None field on EpisodicNode for customer-defined filtering keys.

Safer attribute merging

When persisting an edge or node, the attributes dict was previously merged into edge_data / entity_data with dict.update(...), which would silently overwrite first-class fields (uuid, group_id, created_at, timestamps, etc.) if an attribute dict happened to contain a matching key. Now merged with setdefault semantics — existing typed fields win. Files: graphiti_core/edges.py, graphiti_core/utils/bulk_utils.py.

`reference_time` preservation in bulk edge save

reference_time is now passed through add_nodes_and_edges_bulk_tx alongside the other edge timestamps. File: graphiti_core/utils/bulk_utils.py.

Fact-extraction prompt refinements

Based on evaluation findings that concrete details (brand names, quantities, colors, qualified objects) were being lost during extraction:

extract_edges: relaxed strict two-entity rule (look for a list-anchor before dropping), strengthened detail-preservation rule, clarified intra-CURRENT_MESSAGE dedup scope.
extract_nodes: softened the "Wikipedia article" specificity test so brand-named and descriptor-qualified items ("Gamecube", "wool coat", "cracked windshield") are extracted while bare head nouns ("coat", "windshield") are not.

Files: graphiti_core/prompts/{extract_edges,extract_nodes}.py.

`summarize_sagas` prompt refresh

Updated prompt for the saga-summary path. File: graphiti_core/prompts/summarize_sagas.py.

Minor driver tweaks

graphiti_core/driver/graph_operations/graph_operations.py, graphiti_core/driver/record_parsers.py.

`pytest` bump

server/pyproject.toml and mcp_server/pyproject.toml bumped pytest>=9.0.3 with matching uv.lock refreshes.

Test plan

make check (format + lint + tests) passes locally
Existing unit tests pass (multi-episode code paths are exercised in the new test_concatenate_episodes.py and the updated test_entity_extraction.py / test_bulk_utils.py / test_graphiti_mock.py)
Integration tests (pytest -k _int) — run locally with Neo4j before merge
Reviewer: sanity check the episode_indices / node_episode_index_map plumbing in edge_operations.py and node_operations.py
Reviewer: confirm the attribute-merging change doesn't break any custom edge/entity types that rely on the previous overwrite-on-collision behavior

🤖 Generated with Claude Code

This change brings a set of graphiti_core improvements that have been living in an internal fork back to the OSS repo: - Multi-episode batched extraction. `Graphiti._extract_and_resolve_nodes` and `extract_edges` now accept a list of episodes, build a combined prompt with per-episode `[Episode N]` headers (new `concatenate_episodes` helper in `text_utils.py`), and attach `episode_indices` to each extracted node/edge so the caller can map results back to the originating episode. Supporting changes in `edge_operations.py` / `node_operations.py` / `bulk_utils.py` and corresponding updates to the extract_nodes / extract_edges prompts. - New `fact_triple` episode type and an `episode_metadata` dict field on `EpisodicNode` for customer-defined filtering keys (`nodes.py`). - Safer attribute merging on edges (`edges.py`) and nodes (`bulk_utils.py`) — do not overwrite existing first-class fields (`uuid`, `group_id`, `created_at`, timestamps, etc.) when an attribute dict happens to contain a matching key. - Preserve `reference_time` through bulk edge save (`bulk_utils.py`). - Fact-extraction prompt refinements: detail preservation rule, wider entity extraction for qualified objects, clarified dedup scope (`prompts/extract_edges.py`, `prompts/extract_nodes.py`). - `summarize_sagas.py` prompt refresh. - Minor driver tweaks (`graph_operations.py`, `record_parsers.py`). - Tests for the new concatenate helper, multi-episode behavior, and prompt changes. - `pytest` bumped to 9.0.3 in `server/pyproject.toml` and `mcp_server/pyproject.toml`, with matching lockfile refreshes. Workflows, CONTRIBUTING.md, and the CLA manifest are intentionally not included — those have moved forward upstream and we don't want to regress them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The entity edge save queries in `edge_db_queries.py` (both single and bulk paths) set `e.reference_time = $reference_time` on Kuzu, but the `RelatesToNode_` node-table DDL did not declare a `reference_time` column, so the binder raised `Cannot find property reference_time for e` as soon as any Kuzu edge save ran. This failure did not surface on main because `helpers_test.py` on main did not include Kuzu in the driver fixture list — the Kuzu parametrization of `test_graphiti_mock.py` / `test_edge_int.py` was effectively empty. The previous commit in this PR added Kuzu to the fixture list, which is what exposed the latent bug. Fix: add `reference_time TIMESTAMP` alongside the other temporal columns (`valid_at`, `invalid_at`, `expired_at`) so the column matches what the save queries already expect. Note for existing Kuzu users: `CREATE NODE TABLE IF NOT EXISTS` will not add the column to an existing database. A one-time `ALTER TABLE RelatesToNode_ ADD reference_time TIMESTAMP` may be required if upgrading against an existing store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prasmussen15 temporarily deployed to development April 18, 2026 04:06 — with GitHub Actions Inactive

prasmussen15 temporarily deployed to development April 18, 2026 18:22 — with GitHub Actions Inactive

prasmussen15 merged commit 3c9547c into main Apr 18, 2026
10 of 12 checks passed

prasmussen15 deleted the preston/forward-port-from-zep-proprietary branch April 18, 2026 19:47

prasmussen15 mentioned this pull request Apr 22, 2026

forward-port: combined node+edge extraction + prompt refinements #1432

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forward-port graphiti_core improvements from internal fork#1422

forward-port graphiti_core improvements from internal fork#1422
prasmussen15 merged 2 commits intomainfrom
preston/forward-port-from-zep-proprietary

prasmussen15 commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prasmussen15 commented Apr 18, 2026

Summary

Multi-episode batched extraction

fact_triple episode type and episode_metadata

Safer attribute merging

reference_time preservation in bulk edge save

Fact-extraction prompt refinements

summarize_sagas prompt refresh

Minor driver tweaks

pytest bump

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`fact_triple` episode type and `episode_metadata`

`reference_time` preservation in bulk edge save

`summarize_sagas` prompt refresh

`pytest` bump