Skip to content

Add HNSWDiagnostics endpoint for graph health checks#843

Open
himanalot wants to merge 2 commits intoHelixDB:mainfrom
himanalot:hnsw-diagnostics
Open

Add HNSWDiagnostics endpoint for graph health checks#843
himanalot wants to merge 2 commits intoHelixDB:mainfrom
himanalot:hnsw-diagnostics

Conversation

@himanalot
Copy link
Copy Markdown

@himanalot himanalot commented Jan 30, 2026

Summary

Adds a new builtin endpoint HNSWDiagnostics that checks HNSW graph health and identifies unreachable vectors. This is useful for detecting graph fragmentation caused by deletions/re-insertions.

Input Parameters (JSON)

Parameter Type Default Description
mode string "quick" "quick" (sample-based) or "full" (BFS traversal)
sample_size number 1000 Number of vectors to check in quick mode
label string "" Vector label to use for searches

Example requests:

# Quick mode (default) - samples random vectors
curl -X POST http://localhost:6970/HNSWDiagnostics \
  -H "Content-Type: application/json" \
  -d '{"mode": "quick", "sample_size": 100}'

# Full mode - complete BFS traversal
curl -X POST http://localhost:6970/HNSWDiagnostics \
  -H "Content-Type: application/json" \
  -d '{"mode": "full"}'

Response Format

{
  "entry_point": {
    "id": "1f07ae4b-e354-6660-b5f0-fd3ce8bc4b49",
    "level": 5
  },
  "total_vectors": 97862,
  "total_edges": 1500000,
  "checked_vectors": 1000,
  "unreachable_vectors": ["uuid-1", "uuid-2"],
  "unreachable_count": 2,
  "health_status": "degraded",
  "mode": "quick",
  "diagnostics": {
    "sample_size": 1000,
    "duration_ms": 500
  }
}
Field Description
entry_point HNSW entry point info (id and level), or null if missing
total_vectors Total number of vectors in the database
total_edges Total number of HNSW graph edges
checked_vectors Number of vectors checked in this run
unreachable_vectors Array of UUIDs for vectors that couldn't be reached
unreachable_count Count of unreachable vectors
health_status "healthy" (0% unreachable), "degraded" (<5%), or "broken" (>5% or no entry point)
mode The mode that was used
diagnostics.sample_size Sample size used (or total vectors in full mode)
diagnostics.duration_ms How long the check took in milliseconds

Algorithm Details

Quick Mode:

  1. Randomly samples N vectors from the database
  2. For each sampled vector, searches using its own embedding
  3. If the vector doesn't appear in top-10 results → marked unreachable
  4. Fast but may miss some disconnected vectors

Full Mode:

  1. Starts BFS traversal from the entry point
  2. Traverses all level-0 edges to find connected vectors
  3. Any vector not visited is unreachable
  4. Comprehensive but slower for large graphs

Test Plan

  • Build with cargo build --release
  • Test quick mode with various sample sizes
  • Test full mode on a graph with known disconnected vectors
  • Verify health status thresholds work correctly

🤖 Generated with Claude Code

Greptile Overview

Greptile Summary

Adds HNSWDiagnostics endpoint to detect unreachable vectors in the HNSW graph, addressing graph fragmentation from deletions/re-insertions

Key changes:

  • Implements quick mode (sampling-based) and full mode (BFS traversal) health checks
  • Properly iterates vector_properties_db to collect vector IDs (addressing previous thread concern)
  • Returns comprehensive diagnostics including entry point info, unreachable vectors, and health status thresholds

Minor improvements suggested:

  • Remove unused HNSW import
  • Remove unnecessary arena allocation for label string in quick mode
  • Remove unused arena parameter from full mode function

Important Files Changed

Filename Overview
helix-db/src/helix_gateway/builtin/hnsw_diagnostics.rs Added new diagnostic endpoint with quick/full BFS modes to check HNSW graph health
helix-db/src/helix_gateway/builtin/mod.rs Added hnsw_diagnostics module export

Sequence Diagram

sequenceDiagram
    participant Client
    participant HNSWDiagnostics
    participant VectorCore
    participant Database
    
    Client->>HNSWDiagnostics: POST /HNSWDiagnostics {mode, sample_size, label}
    HNSWDiagnostics->>Database: Read transaction
    HNSWDiagnostics->>Database: Iterate vector_properties_db
    Database-->>HNSWDiagnostics: vector_ids[]
    
    HNSWDiagnostics->>Database: Get entry point from vectors_db
    Database-->>HNSWDiagnostics: entry_point_id & level
    
    alt mode == "quick"
        HNSWDiagnostics->>HNSWDiagnostics: Shuffle & sample N random vectors
        loop For each sampled vector
            HNSWDiagnostics->>VectorCore: get_full_vector(vector_id)
            VectorCore-->>HNSWDiagnostics: vector with embedding
            HNSWDiagnostics->>VectorCore: search(embedding, k=10, label)
            VectorCore-->>HNSWDiagnostics: search results
            HNSWDiagnostics->>HNSWDiagnostics: Check if vector_id in results
            alt vector_id not found
                HNSWDiagnostics->>HNSWDiagnostics: Mark as unreachable
            end
        end
    else mode == "full"
        HNSWDiagnostics->>HNSWDiagnostics: BFS from entry_point
        loop BFS traversal
            HNSWDiagnostics->>Database: Get level-0 neighbors from edges_db
            Database-->>HNSWDiagnostics: neighbor_ids[]
            HNSWDiagnostics->>HNSWDiagnostics: Add unvisited to queue
        end
        HNSWDiagnostics->>HNSWDiagnostics: Find unvisited vectors (unreachable)
    end
    
    HNSWDiagnostics->>HNSWDiagnostics: Calculate health_status
    HNSWDiagnostics-->>Client: JSON response with diagnostics
Loading

Adds a new builtin endpoint that checks HNSW graph connectivity and
identifies unreachable vectors. Supports two modes:
- quick: samples N vectors and verifies they appear in search results
- full: BFS traversal from entry point to find disconnected vectors

Returns health status (healthy/degraded/broken), unreachable vector IDs,
and diagnostic metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread helix-db/src/helix_gateway/builtin/hnsw_diagnostics.rs Outdated
Yes, I have this in my local repo but forgot to include this change here

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@himanalot
Copy link
Copy Markdown
Author

@greptile review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant