Skip to content

Optimize Gremlin schema sync queries for large graphs#1805

Merged
kmcginnes merged 1 commit into
mainfrom
fix/edge-schema-query-performance
Jun 5, 2026
Merged

Optimize Gremlin schema sync queries for large graphs#1805
kmcginnes merged 1 commit into
mainfrom
fix/edge-schema-query-performance

Conversation

@kmcginnes
Copy link
Copy Markdown
Collaborator

@kmcginnes kmcginnes commented Jun 4, 2026

Description

Both vertex and edge schema templates now use g.V().limit(1) as a dummy anchor instead of g.E() (edges) and g.V().union().fold() (vertices). Each .by() modulator runs an independent global sub-traversal that doesn't depend on the anchor value, so this is semantically equivalent. The change ensures Neptune's DFE engine handles the full query natively, avoiding fallback to slower execution paths.

Validation

  • All gremlin connector tests pass (105 tests)
  • pnpm checks passes (lint, format, types)
  • Profiled on three Neptune engine versions confirming native DFE execution
  • Response shape unchanged — downstream parsers unaffected

Related Issues

Check List

  • I confirm that my contribution is made under the terms of the Apache 2.0 license.
  • I have verified pnpm checks passes with no errors.
  • I have verified pnpm test passes with no failures.
  • I have covered new added functionality with unit tests if necessary.
  • I have updated documentation if necessary.

Both vertex and edge schema templates now use g.V().limit(1) as a dummy
anchor instead of g.E() / g.V().union().fold(). The .by() modulators run
independent global sub-traversals so the anchor value is irrelevant, but
this change ensures Neptune's DFE engine handles the full query natively
on all tested versions (1.2.1.0, 1.3.5.0, 1.4.7.0).

Closes #1803
@kmcginnes kmcginnes marked this pull request as ready for review June 4, 2026 21:03
Copy link
Copy Markdown
Contributor

@Cole-Greer Cole-Greer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated query templates look good to me. Looks like the patterns to follow given Graph Explorer's need to support such a broad range of gremlin versions.

@kmcginnes kmcginnes merged commit ea12236 into main Jun 5, 2026
6 checks passed
@kmcginnes kmcginnes deleted the fix/edge-schema-query-performance branch June 5, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Edge schema sync query can timeout on large graphs Optimize schema sync DB queries

2 participants