Add HNSWDiagnostics endpoint for graph health checks#843
Open
himanalot wants to merge 2 commits intoHelixDB:mainfrom
Open
Add HNSWDiagnostics endpoint for graph health checks#843himanalot wants to merge 2 commits intoHelixDB:mainfrom
himanalot wants to merge 2 commits intoHelixDB:mainfrom
Conversation
Adds a new builtin endpoint that checks HNSW graph connectivity and identifies unreachable vectors. Supports two modes: - quick: samples N vectors and verifies they appear in search results - full: BFS traversal from entry point to find disconnected vectors Returns health status (healthy/degraded/broken), unreachable vector IDs, and diagnostic metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Yes, I have this in my local repo but forgot to include this change here Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Author
|
@greptile review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new builtin endpoint
HNSWDiagnosticsthat checks HNSW graph health and identifies unreachable vectors. This is useful for detecting graph fragmentation caused by deletions/re-insertions.Input Parameters (JSON)
mode"quick""quick"(sample-based) or"full"(BFS traversal)sample_size1000label""Example requests:
Response Format
{ "entry_point": { "id": "1f07ae4b-e354-6660-b5f0-fd3ce8bc4b49", "level": 5 }, "total_vectors": 97862, "total_edges": 1500000, "checked_vectors": 1000, "unreachable_vectors": ["uuid-1", "uuid-2"], "unreachable_count": 2, "health_status": "degraded", "mode": "quick", "diagnostics": { "sample_size": 1000, "duration_ms": 500 } }entry_pointnullif missingtotal_vectorstotal_edgeschecked_vectorsunreachable_vectorsunreachable_counthealth_status"healthy"(0% unreachable),"degraded"(<5%), or"broken"(>5% or no entry point)modediagnostics.sample_sizediagnostics.duration_msAlgorithm Details
Quick Mode:
Full Mode:
Test Plan
cargo build --release🤖 Generated with Claude Code
Greptile Overview
Greptile Summary
Adds
HNSWDiagnosticsendpoint to detect unreachable vectors in the HNSW graph, addressing graph fragmentation from deletions/re-insertionsKey changes:
vector_properties_dbto collect vector IDs (addressing previous thread concern)Minor improvements suggested:
HNSWimportarenaparameter from full mode functionImportant Files Changed
Sequence Diagram
sequenceDiagram participant Client participant HNSWDiagnostics participant VectorCore participant Database Client->>HNSWDiagnostics: POST /HNSWDiagnostics {mode, sample_size, label} HNSWDiagnostics->>Database: Read transaction HNSWDiagnostics->>Database: Iterate vector_properties_db Database-->>HNSWDiagnostics: vector_ids[] HNSWDiagnostics->>Database: Get entry point from vectors_db Database-->>HNSWDiagnostics: entry_point_id & level alt mode == "quick" HNSWDiagnostics->>HNSWDiagnostics: Shuffle & sample N random vectors loop For each sampled vector HNSWDiagnostics->>VectorCore: get_full_vector(vector_id) VectorCore-->>HNSWDiagnostics: vector with embedding HNSWDiagnostics->>VectorCore: search(embedding, k=10, label) VectorCore-->>HNSWDiagnostics: search results HNSWDiagnostics->>HNSWDiagnostics: Check if vector_id in results alt vector_id not found HNSWDiagnostics->>HNSWDiagnostics: Mark as unreachable end end else mode == "full" HNSWDiagnostics->>HNSWDiagnostics: BFS from entry_point loop BFS traversal HNSWDiagnostics->>Database: Get level-0 neighbors from edges_db Database-->>HNSWDiagnostics: neighbor_ids[] HNSWDiagnostics->>HNSWDiagnostics: Add unvisited to queue end HNSWDiagnostics->>HNSWDiagnostics: Find unvisited vectors (unreachable) end HNSWDiagnostics->>HNSWDiagnostics: Calculate health_status HNSWDiagnostics-->>Client: JSON response with diagnostics