Vendor-neutral semantic search pipeline toolkit using open-source components.
Companion open-source implementation for the paper:
Designing Vendor-Neutral Semantic Search Pipelines Using Open-Source Embedding Models and FAISS World Journal of Advanced Engineering Technology and Sciences (WJAETS), 2026, 18(3) DOI: 10.30574/wjaets.2026.18.3.0038
Most semantic search implementations today are locked into proprietary ecosystems: OpenAI embeddings with Pinecone, Cohere with Weaviate, or cloud-specific vector databases. Switching providers means re-embedding entire corpora and rewriting integration code.
This library provides a swap-and-compare framework for embedding models, FAISS index selection guidance, a complete on-premise pipeline from document ingestion through retrieval and evaluation, and a migration planner for teams moving from proprietary stacks to open-source alternatives.
| Module | Purpose |
|---|---|
opensemanticsearch.embed |
Embedding model abstraction with multi-model benchmarking |
opensemanticsearch.index |
In-memory vector index (cosine/dot) and FAISS index type advisor |
opensemanticsearch.retrieve |
Semantic search, BM25 keyword scoring, and hybrid search |
opensemanticsearch.evaluate |
NDCG, Recall, and MRR evaluation over query-relevance pairs |
opensemanticsearch.migrate |
Migration planner from proprietary to open-source stacks |
pip install opensemanticsearchOr with UV:
uv add opensemanticsearchimport numpy as np
from opensemanticsearch.index.manager import InMemoryIndexManager
from opensemanticsearch.retrieve.engine import SearchEngine
# Build an index
index = InMemoryIndexManager(metric="cosine")
index.add(doc_ids, doc_embeddings)
# Search
engine = SearchEngine(index=index)
results = engine.search(query_vector, top_k=10)
for r in results:
print(f"rank={r.rank} score={r.score:.3f} id={r.doc_id}")from opensemanticsearch.retrieve.engine import BM25Scorer, HybridSearchEngine
bm25 = BM25Scorer()
bm25.add_documents(doc_ids, documents)
hybrid = HybridSearchEngine(
semantic_engine=engine,
bm25_scorer=bm25,
semantic_weight=0.7,
)
results = hybrid.search(query_vector, query_text="bank reconciliation quickbooks", top_k=10)from opensemanticsearch.index.advisor import IndexAdvisor, IndexAdvisorConfig
advisor = IndexAdvisor(IndexAdvisorConfig(
corpus_size=5_000_000,
embedding_dim=384,
memory_budget_gb=8,
latency_target_ms=10,
))
rec = advisor.recommend()
print(rec.summary())
# Recommended index: IVF4096,PQ48
# Memory: 1.86 GB | Latency: 15.0ms | Recall: 92%from opensemanticsearch.evaluate.evaluator import RetrievalEvaluator, QueryResult
evaluator = RetrievalEvaluator(k_values=[1, 5, 10])
results = evaluator.evaluate([
QueryResult("q1", retrieved_ids=["d1", "d2", "d3"], relevant_ids={"d1", "d3"}),
])
print(results.summary())from opensemanticsearch.migrate.planner import MigrationPlanner, MigrationPlannerConfig
planner = MigrationPlanner(MigrationPlannerConfig(
current_provider="openai",
target_provider="open_source",
corpus_size=2_000_000,
))
plan = planner.plan()
print(plan.summary())
# Monthly savings: $90 | Quality impact: -4.0%uv sync --all-extras
uv run pytest tests/ -v --cov=src
uv run isort src/ tests/ && uv run black src/ tests/If you use this library in your research, please cite the paper:
Designing Vendor-Neutral Semantic Search Pipelines Using Open-Source
Embedding Models and FAISS.
World Journal of Advanced Engineering Technology and Sciences (WJAETS),
2026, 18(3). DOI: 10.30574/wjaets.2026.18.3.0038
Apache 2.0