forward-port graphiti_core improvements from internal fork#1422
Merged
prasmussen15 merged 2 commits intomainfrom Apr 18, 2026
Merged
forward-port graphiti_core improvements from internal fork#1422prasmussen15 merged 2 commits intomainfrom
prasmussen15 merged 2 commits intomainfrom
Conversation
This change brings a set of graphiti_core improvements that have been living in an internal fork back to the OSS repo: - Multi-episode batched extraction. `Graphiti._extract_and_resolve_nodes` and `extract_edges` now accept a list of episodes, build a combined prompt with per-episode `[Episode N]` headers (new `concatenate_episodes` helper in `text_utils.py`), and attach `episode_indices` to each extracted node/edge so the caller can map results back to the originating episode. Supporting changes in `edge_operations.py` / `node_operations.py` / `bulk_utils.py` and corresponding updates to the extract_nodes / extract_edges prompts. - New `fact_triple` episode type and an `episode_metadata` dict field on `EpisodicNode` for customer-defined filtering keys (`nodes.py`). - Safer attribute merging on edges (`edges.py`) and nodes (`bulk_utils.py`) — do not overwrite existing first-class fields (`uuid`, `group_id`, `created_at`, timestamps, etc.) when an attribute dict happens to contain a matching key. - Preserve `reference_time` through bulk edge save (`bulk_utils.py`). - Fact-extraction prompt refinements: detail preservation rule, wider entity extraction for qualified objects, clarified dedup scope (`prompts/extract_edges.py`, `prompts/extract_nodes.py`). - `summarize_sagas.py` prompt refresh. - Minor driver tweaks (`graph_operations.py`, `record_parsers.py`). - Tests for the new concatenate helper, multi-episode behavior, and prompt changes. - `pytest` bumped to 9.0.3 in `server/pyproject.toml` and `mcp_server/pyproject.toml`, with matching lockfile refreshes. Workflows, CONTRIBUTING.md, and the CLA manifest are intentionally not included — those have moved forward upstream and we don't want to regress them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The entity edge save queries in `edge_db_queries.py` (both single and bulk paths) set `e.reference_time = $reference_time` on Kuzu, but the `RelatesToNode_` node-table DDL did not declare a `reference_time` column, so the binder raised `Cannot find property reference_time for e` as soon as any Kuzu edge save ran. This failure did not surface on main because `helpers_test.py` on main did not include Kuzu in the driver fixture list — the Kuzu parametrization of `test_graphiti_mock.py` / `test_edge_int.py` was effectively empty. The previous commit in this PR added Kuzu to the fixture list, which is what exposed the latent bug. Fix: add `reference_time TIMESTAMP` alongside the other temporal columns (`valid_at`, `invalid_at`, `expired_at`) so the column matches what the save queries already expect. Note for existing Kuzu users: `CREATE NODE TABLE IF NOT EXISTS` will not add the column to an existing database. A one-time `ALTER TABLE RelatesToNode_ ADD reference_time TIMESTAMP` may be required if upgrading against an existing store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Forward-ports a set of
graphiti_coreimprovements that have been living in an internal fork back to this repo. No workflow / CONTRIBUTING / CLA changes are included — those have moved forward here and we don't want to regress them.Multi-episode batched extraction
Graphiti._extract_and_resolve_nodesandextract_edgesnow accept a list of episodes instead of a single one. When multiple are passed:[Episode N]headers via the newconcatenate_episodeshelper ingraphiti_core/utils/text_utils.py.episode_indices: list[int]so callers can map results back to the originating episode.extract_nodes.extract_message,extract_edges.edge) and Pydantic models updated accordingly.build_episodic_edges(inedge_operations.py) accepts a single or multi episode UUID and anode_episode_index_mapto wire upMENTIONED_INedges per episode.Files:
graphiti_core/graphiti.py,graphiti_core/utils/text_utils.py,graphiti_core/utils/maintenance/{edge,node}_operations.py,graphiti_core/utils/bulk_utils.py,graphiti_core/prompts/{extract_edges,extract_nodes}.py,tests/utils/test_concatenate_episodes.py,tests/utils/maintenance/{test_bulk_utils,test_entity_extraction}.py,tests/{helpers_test,test_graphiti_mock}.py.fact_tripleepisode type andepisode_metadataEpisodeType.fact_triplevariant ingraphiti_core/nodes.py.episode_metadata: dict[str, Any] | Nonefield onEpisodicNodefor customer-defined filtering keys.Safer attribute merging
When persisting an edge or node, the
attributesdict was previously merged intoedge_data/entity_datawithdict.update(...), which would silently overwrite first-class fields (uuid,group_id,created_at, timestamps, etc.) if an attribute dict happened to contain a matching key. Now merged withsetdefaultsemantics — existing typed fields win. Files:graphiti_core/edges.py,graphiti_core/utils/bulk_utils.py.reference_timepreservation in bulk edge savereference_timeis now passed throughadd_nodes_and_edges_bulk_txalongside the other edge timestamps. File:graphiti_core/utils/bulk_utils.py.Fact-extraction prompt refinements
Based on evaluation findings that concrete details (brand names, quantities, colors, qualified objects) were being lost during extraction:
extract_edges: relaxed strict two-entity rule (look for a list-anchor before dropping), strengthened detail-preservation rule, clarified intra-CURRENT_MESSAGEdedup scope.extract_nodes: softened the "Wikipedia article" specificity test so brand-named and descriptor-qualified items ("Gamecube", "wool coat", "cracked windshield") are extracted while bare head nouns ("coat", "windshield") are not.Files:
graphiti_core/prompts/{extract_edges,extract_nodes}.py.summarize_sagasprompt refreshUpdated prompt for the saga-summary path. File:
graphiti_core/prompts/summarize_sagas.py.Minor driver tweaks
graphiti_core/driver/graph_operations/graph_operations.py,graphiti_core/driver/record_parsers.py.pytestbumpserver/pyproject.tomlandmcp_server/pyproject.tomlbumpedpytest>=9.0.3with matchinguv.lockrefreshes.Test plan
make check(format + lint + tests) passes locallytest_concatenate_episodes.pyand the updatedtest_entity_extraction.py/test_bulk_utils.py/test_graphiti_mock.py)pytest -k _int) — run locally with Neo4j before mergeepisode_indices/node_episode_index_mapplumbing inedge_operations.pyandnode_operations.py🤖 Generated with Claude Code