feat: Hybrid Search with RRF (SearchBM25 + SearchV)#837
Open
AmaanBilwar wants to merge 17 commits intoHelixDB:mainfrom
Open
feat: Hybrid Search with RRF (SearchBM25 + SearchV)#837AmaanBilwar wants to merge 17 commits intoHelixDB:mainfrom
AmaanBilwar wants to merge 17 commits intoHelixDB:mainfrom
Conversation
step + impl display for source step
…sary code. Added Value import and updated string formatting for literals.
Author
|
@xav-db whenever u have the time 🫡 |
Author
|
not sure other ways to test it out icl |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
implements a new
SearchHybridoperator that runs both vector (HNSW) and BM25 keyword searches, returning combined results that can be fused using existingRerankRRForRerankMMRsteps.This feature lets users combine vector search and keyword-based in a single query
Related Issues
N/A
Closes #828
Checklist when merging to main
rustfmthelix-cli/Cargo.tomlandhelixdb/Cargo.tomlAdditional Notes
Greptile Overview
Greptile Summary
This PR adds a new
SearchHybridoperator that combines vector similarity search (HNSW) with BM25 keyword search, allowing users to leverage both semantic and keyword-based retrieval in a single query. Results are returned as a combined iterator that can be fused using existingRerankRRForRerankMMRoperators.Key Changes:
SearchHybridAdaptertrait in runtime that executes both vector and BM25 searches, returning combined resultsSearchHybrid<Label>(vector_data, query_text, k)syntaxArchitecture:
The implementation follows the existing pattern of
SearchVectorandSearchBM25, integrating cleanly into the compiler pipeline from parsing through code generation. Vector results are returned first, followed by BM25 results, allowing downstream rerankers to properly fuse them based on position.Testing:
Tests cover basic usage,
Embed()for vector generation, variable parameters, and chaining withRerankRRFandRerankMMRoperators.Checklist Status:
Two checklist items remain unchecked: doc comments and version number updates. Consider adding doc comments to key public functions and updating version numbers before merging.
Important Files Changed
search_hybridrule with proper syntax for vector_data, query text, and k parameter. Follows existing patterns for SearchV and SearchBM25.Sequence Diagram
sequenceDiagram participant User participant Parser participant Analyzer participant Generator participant Runtime participant VectorSearch participant BM25Search User->>Parser: SearchHybrid<Document>(vec, "query", 10) Parser->>Parser: parse_search_hybrid() Parser->>Parser: Extract label, vector_data, query, k Parser-->>Analyzer: SearchHybrid AST Analyzer->>Analyzer: infer_expr_type() Analyzer->>Analyzer: Validate label exists in vector_set Analyzer->>Analyzer: Process vector_data (literal/identifier/Embed) Analyzer->>Analyzer: Process query (string/identifier) Analyzer->>Analyzer: Process k parameter Analyzer-->>Generator: GeneratedSearchHybrid Generator->>Generator: Generate Rust code Generator->>Generator: Format: search_hybrid(label, vec, query, k)? Generator-->>Runtime: Compiled trait method call Runtime->>Runtime: search_hybrid() execution Runtime->>VectorSearch: vectors.search(query_vec, k, label) VectorSearch-->>Runtime: Vec<HVector> with scores Runtime->>BM25Search: bm25.search(query_text, k) BM25Search-->>Runtime: Vec<(doc_id, score)> Runtime->>Runtime: Convert vectors to TraversalValue::Vector Runtime->>Runtime: Lookup BM25 doc_ids in nodes_db Runtime->>Runtime: Filter by label Runtime->>Runtime: Convert to TraversalValue::NodeWithScore Runtime->>Runtime: Chain vector_results + bm25_results Runtime-->>User: RoTraversalIterator with combined results