Skip to content

forward-port: combined node+edge extraction + prompt refinements#1432

Merged
prasmussen15 merged 1 commit intomainfrom
preston/forward-port-combined-extraction
Apr 22, 2026
Merged

forward-port: combined node+edge extraction + prompt refinements#1432
prasmussen15 merged 1 commit intomainfrom
preston/forward-port-combined-extraction

Conversation

@prasmussen15
Copy link
Copy Markdown
Collaborator

Summary

Second wave of graphiti_core forward-ports from the internal fork, on top of #1422 (already merged).

Combined node + edge extraction (feature-flagged)

New prompts/extract_nodes_and_edges.py prompt and new graphiti_core/utils/maintenance/combined_extraction.py module let a single LLM call produce both nodes and edges for an episode when use_combined_extraction=True is passed to extract_nodes_and_edges_bulk. Default behavior is unchanged — _extract_nodes_and_edges_bulk_separate preserves the existing two-call path. Opt in via the new kwarg.

Separated timestamp extraction

New extract_timestamps / extract_timestamps_batch prompts on extract_edges. Temporal bounds (valid_at / invalid_at) can now be resolved in a dedicated post-extraction pass, decoupling structural fact extraction from date parsing. Pydantic models: EdgeTimestamps, BatchEdgeTimestamps.

Prompt refinements

  • extract_edges rule 3 further loosened in the combined-extraction path to reduce missed edges; examples and duplicate-vs-not-duplicate worked examples updated.
  • extract_nodes / extract_edges episode numbering switched from 1-indexed to 0-indexed in both the Pydantic episode_indices defaults and the prompt text. concatenate_episodes and its tests follow suit.

Wiring

  • prompts/lib.py registers the new extract_nodes_and_edges prompt family.
  • Small follow-through cleanups in node_operations.py / edge_operations.py to match the new indexing and to plug into the combined path.

Not included

Test plan

  • make check passes locally
  • Existing tests in tests/utils/test_concatenate_episodes.py (updated for 0-indexed headers) pass
  • Reviewer: sanity check the use_combined_extraction=False default preserves prior behavior on all existing callers
  • Reviewer: confirm extract_timestamps_batch schema (ordered list[EdgeTimestamps]) matches how the caller zips results back to edges
  • Integration tests (pytest -k _int) — run locally with Neo4j and Kuzu before merge

🤖 Generated with Claude Code

Brings across a second wave of `graphiti_core` improvements from the
internal fork, on top of the original forward-port in this branch:

- Combined node + edge extraction (feature-flagged). New
  `prompts/extract_nodes_and_edges.py` prompt and new
  `utils/maintenance/combined_extraction.py` module let a single LLM
  call produce both nodes and edges for an episode when
  `use_combined_extraction=True` is passed to
  `extract_nodes_and_edges_bulk`. Default behavior is unchanged —
  `_extract_nodes_and_edges_bulk_separate` is still the old two-call
  path.
- New `extract_timestamps` / `extract_timestamps_batch` prompts on
  `extract_edges` so temporal bounds (`valid_at` / `invalid_at`) can
  be resolved in a dedicated step after extraction, decoupling the
  structural fact extraction from date parsing.
- Dedup prompt refinement: further loosened two-entity rule in the
  combined extraction path to reduce missed edges.
- 0-indexed episodes. `episode_indices` and the `[Episode N]` headers
  in `concatenate_episodes` now start at 0 (was 1-indexed in the
  first forward-port). Tests and prompt examples updated.
- Small cleanups in `node_operations.py`, `edge_operations.py`,
  `extract_nodes.py` to match the new indexing and to wire up the
  combined-extraction path.
- `prompts/lib.py` registers the new `extract_nodes_and_edges` prompt
  family.

Intentionally NOT carried over from the internal fork:
- Reverting the `reference_time TIMESTAMP` column on the Kuzu
  `RelatesToNode_` schema. The internal fork does not need this fix
  (Kuzu isn't exercised there), but upstream CI does — keeping the
  prior commit's schema change intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prasmussen15 prasmussen15 merged commit 673902c into main Apr 22, 2026
10 of 12 checks passed
@prasmussen15 prasmussen15 deleted the preston/forward-port-combined-extraction branch April 22, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant