forward-port: combined node+edge extraction + prompt refinements#1432
Merged
prasmussen15 merged 1 commit intomainfrom Apr 22, 2026
Merged
forward-port: combined node+edge extraction + prompt refinements#1432prasmussen15 merged 1 commit intomainfrom
prasmussen15 merged 1 commit intomainfrom
Conversation
Brings across a second wave of `graphiti_core` improvements from the internal fork, on top of the original forward-port in this branch: - Combined node + edge extraction (feature-flagged). New `prompts/extract_nodes_and_edges.py` prompt and new `utils/maintenance/combined_extraction.py` module let a single LLM call produce both nodes and edges for an episode when `use_combined_extraction=True` is passed to `extract_nodes_and_edges_bulk`. Default behavior is unchanged — `_extract_nodes_and_edges_bulk_separate` is still the old two-call path. - New `extract_timestamps` / `extract_timestamps_batch` prompts on `extract_edges` so temporal bounds (`valid_at` / `invalid_at`) can be resolved in a dedicated step after extraction, decoupling the structural fact extraction from date parsing. - Dedup prompt refinement: further loosened two-entity rule in the combined extraction path to reduce missed edges. - 0-indexed episodes. `episode_indices` and the `[Episode N]` headers in `concatenate_episodes` now start at 0 (was 1-indexed in the first forward-port). Tests and prompt examples updated. - Small cleanups in `node_operations.py`, `edge_operations.py`, `extract_nodes.py` to match the new indexing and to wire up the combined-extraction path. - `prompts/lib.py` registers the new `extract_nodes_and_edges` prompt family. Intentionally NOT carried over from the internal fork: - Reverting the `reference_time TIMESTAMP` column on the Kuzu `RelatesToNode_` schema. The internal fork does not need this fix (Kuzu isn't exercised there), but upstream CI does — keeping the prior commit's schema change intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second wave of
graphiti_coreforward-ports from the internal fork, on top of #1422 (already merged).Combined node + edge extraction (feature-flagged)
New
prompts/extract_nodes_and_edges.pyprompt and newgraphiti_core/utils/maintenance/combined_extraction.pymodule let a single LLM call produce both nodes and edges for an episode whenuse_combined_extraction=Trueis passed toextract_nodes_and_edges_bulk. Default behavior is unchanged —_extract_nodes_and_edges_bulk_separatepreserves the existing two-call path. Opt in via the new kwarg.Separated timestamp extraction
New
extract_timestamps/extract_timestamps_batchprompts onextract_edges. Temporal bounds (valid_at/invalid_at) can now be resolved in a dedicated post-extraction pass, decoupling structural fact extraction from date parsing. Pydantic models:EdgeTimestamps,BatchEdgeTimestamps.Prompt refinements
extract_edgesrule 3 further loosened in the combined-extraction path to reduce missed edges; examples and duplicate-vs-not-duplicate worked examples updated.extract_nodes/extract_edgesepisode numbering switched from 1-indexed to 0-indexed in both the Pydanticepisode_indicesdefaults and the prompt text.concatenate_episodesand its tests follow suit.Wiring
prompts/lib.pyregisters the newextract_nodes_and_edgesprompt family.node_operations.py/edge_operations.pyto match the new indexing and to plug into the combined path.Not included
reference_time TIMESTAMPcolumn on the KuzuRelatesToNode_schema added in forward-port graphiti_core improvements from internal fork #1422 — the internal fork does not have it (Kuzu isn't exercised there), but upstream CI needs it.Test plan
make checkpasses locallytests/utils/test_concatenate_episodes.py(updated for 0-indexed headers) passuse_combined_extraction=Falsedefault preserves prior behavior on all existing callersextract_timestamps_batchschema (orderedlist[EdgeTimestamps]) matches how the caller zips results back to edgespytest -k _int) — run locally with Neo4j and Kuzu before merge🤖 Generated with Claude Code