Skip to content

v2.5.0

Latest

Choose a tag to compare

@shubhadeepd shubhadeepd released this 17 Mar 17:26
· 8 commits to main since this release
6d8e0ae

Release 2.5.0 (2026-03-17)

This release introduces support for the Nemotron-super-3 model, updates NIMs to the latest versions, upgrades NV-Ingest, and adds continuous ingestion along with RTX 6000 MIG support.

Highlights

This release includes the following key updates:

  • Nemotron-super-3 model support. You can now integrate the Nemotron-super-3 model by following the steps outlined in this document.
  • NIMs updated to latest versions.
    The following model updates are included:
    • nvidia/llama-3.2-nv-embedqa-1b-v2nvidia/llama-nemotron-embed-1b-v2
    • nvidia/llama-3.2-nv-rerankqa-1b-v2nvidia/llama-nemotron-rerank-1b-v2
    • nemoretriever-page-elements-v3nemotron-page-elements-v3
    • nemoretriever-graphic-elements-v1nemotron-graphic-elements-v1
    • nemoretriever-table-structure-v1nemotron-table-structure-v1
    • nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1nvidia/llama-nemotron-embed-vl-1b-v2
  • Updated NVIngest to version 26.1.2.
  • Added an example demonstrating the continuous ingestion pipeline. For more information, see rag_event_ingest.ipynb.
  • Added MIG support for RTX 6000. For details, refer to MIG Deployment and use values-mig-rtx6000.yaml and mig-config-rtx6000.yaml.
  • Added documentation for the experimental Nemotron-parse-only ingestion pipeline. This configuration allows you to perform extraction using only Nemotron Parse through NV-Ingest, without relying on OCR, page-elements, graphic-elements, or table-structure NIMs. For more information, refer to nemotron-parse-extraction.md.
  • Several bug fixes, including frontend CVE resolutions, improved multimodal content concatenation for VLM embeddings, enhanced VDB serialization for high-concurrency parallel ingestion, and updates to observability and NeMo Guardrails configurations.
  • Added agentic skills support: the rag-blueprint skill enables AI coding assistants (Claude Code, Cursor, Codex, etc.) to deploy, configure, troubleshoot, and manage the RAG Blueprint autonomously. For details, refer to RAG Blueprint Agent Skill.
  • Added accuracy benchmark results across seven public datasets (RagBattlepacket, KG-RAG, Financebench, DC767, HotPotQA, Google Frames, and Vidore), comparing LLM and VLM configurations with reasoning on/off. Benchmarks use the NVIDIA Answer Accuracy metric from RAGAS.
  • Added a noteboook showcasing langchain connector for NVIDIA RAG Blueprint.

Fixed Known Issues

The following known issues have been resolved in this release:

  • Addressed frontend CVEs.

  • Resolved VDB indexing issues during high-concurrency batch parallel ingestion by implementing VDB serialization.