Release v2.5.0 · NVIDIA-AI-Blueprints/rag

Release 2.5.0 (2026-03-17)

This release introduces support for the Nemotron-super-3 model, updates NIMs to the latest versions, upgrades NV-Ingest, and adds continuous ingestion along with RTX 6000 MIG support.

Highlights

This release includes the following key updates:

Nemotron-super-3 model support. You can now integrate the Nemotron-super-3 model by following the steps outlined in this document.
NIMs updated to latest versions.
The following model updates are included:
- nvidia/llama-3.2-nv-embedqa-1b-v2 → nvidia/llama-nemotron-embed-1b-v2
- nvidia/llama-3.2-nv-rerankqa-1b-v2 → nvidia/llama-nemotron-rerank-1b-v2
- nemoretriever-page-elements-v3 → nemotron-page-elements-v3
- nemoretriever-graphic-elements-v1 → nemotron-graphic-elements-v1
- nemoretriever-table-structure-v1 → nemotron-table-structure-v1
- nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1 → nvidia/llama-nemotron-embed-vl-1b-v2
Updated NVIngest to version 26.1.2.
Added an example demonstrating the continuous ingestion pipeline. For more information, see rag_event_ingest.ipynb.
Added MIG support for RTX 6000. For details, refer to MIG Deployment and use values-mig-rtx6000.yaml and mig-config-rtx6000.yaml.
Added documentation for the experimental Nemotron-parse-only ingestion pipeline. This configuration allows you to perform extraction using only Nemotron Parse through NV-Ingest, without relying on OCR, page-elements, graphic-elements, or table-structure NIMs. For more information, refer to nemotron-parse-extraction.md.
Several bug fixes, including frontend CVE resolutions, improved multimodal content concatenation for VLM embeddings, enhanced VDB serialization for high-concurrency parallel ingestion, and updates to observability and NeMo Guardrails configurations.
Added agentic skills support: the rag-blueprint skill enables AI coding assistants (Claude Code, Cursor, Codex, etc.) to deploy, configure, troubleshoot, and manage the RAG Blueprint autonomously. For details, refer to RAG Blueprint Agent Skill.
Added accuracy benchmark results across seven public datasets (RagBattlepacket, KG-RAG, Financebench, DC767, HotPotQA, Google Frames, and Vidore), comparing LLM and VLM configurations with reasoning on/off. Benchmarks use the NVIDIA Answer Accuracy metric from RAGAS.
Added a noteboook showcasing langchain connector for NVIDIA RAG Blueprint.

Fixed Known Issues

The following known issues have been resolved in this release:

Addressed frontend CVEs.
Resolved VDB indexing issues during high-concurrency batch parallel ingestion by implementing VDB serialization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.5.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Release 2.5.0 (2026-03-17)

Highlights

Fixed Known Issues

Uh oh!