Release 2.5.0 (2026-03-17)
This release introduces support for the Nemotron-super-3 model, updates NIMs to the latest versions, upgrades NV-Ingest, and adds continuous ingestion along with RTX 6000 MIG support.
Highlights
This release includes the following key updates:
- Nemotron-super-3 model support. You can now integrate the Nemotron-super-3 model by following the steps outlined in this document.
- NIMs updated to latest versions.
The following model updates are included:nvidia/llama-3.2-nv-embedqa-1b-v2→nvidia/llama-nemotron-embed-1b-v2nvidia/llama-3.2-nv-rerankqa-1b-v2→nvidia/llama-nemotron-rerank-1b-v2nemoretriever-page-elements-v3→nemotron-page-elements-v3nemoretriever-graphic-elements-v1→nemotron-graphic-elements-v1nemoretriever-table-structure-v1→nemotron-table-structure-v1nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1→nvidia/llama-nemotron-embed-vl-1b-v2
- Updated NVIngest to version 26.1.2.
- Added an example demonstrating the continuous ingestion pipeline. For more information, see rag_event_ingest.ipynb.
- Added MIG support for RTX 6000. For details, refer to MIG Deployment and use
values-mig-rtx6000.yamlandmig-config-rtx6000.yaml. - Added documentation for the experimental Nemotron-parse-only ingestion pipeline. This configuration allows you to perform extraction using only Nemotron Parse through NV-Ingest, without relying on OCR, page-elements, graphic-elements, or table-structure NIMs. For more information, refer to nemotron-parse-extraction.md.
- Several bug fixes, including frontend CVE resolutions, improved multimodal content concatenation for VLM embeddings, enhanced VDB serialization for high-concurrency parallel ingestion, and updates to observability and NeMo Guardrails configurations.
- Added agentic skills support: the
rag-blueprintskill enables AI coding assistants (Claude Code, Cursor, Codex, etc.) to deploy, configure, troubleshoot, and manage the RAG Blueprint autonomously. For details, refer to RAG Blueprint Agent Skill. - Added accuracy benchmark results across seven public datasets (RagBattlepacket, KG-RAG, Financebench, DC767, HotPotQA, Google Frames, and Vidore), comparing LLM and VLM configurations with reasoning on/off. Benchmarks use the NVIDIA Answer Accuracy metric from RAGAS.
- Added a noteboook showcasing langchain connector for NVIDIA RAG Blueprint.
Fixed Known Issues
The following known issues have been resolved in this release:
-
Addressed frontend CVEs.
-
Resolved VDB indexing issues during high-concurrency batch parallel ingestion by implementing VDB serialization.