A principal-level AI architecture profile for production RAG, agentic workflows, evaluation, observability, and implementation-ready issue design.
rag-architect is an open-source Hermes Agent profile and skill pack for turning ambitious AI/RAG goals into production-grade architecture, ADRs, evaluation plans, observability specs, and implementation-ready GitHub issues.
It is designed for teams building real AI systems where quality, reliability, latency, cost, and business impact all matter.
Created by Grey Newell · GitHub: @greynewell
If this helps you ship better RAG or agentic systems, please ⭐ star the repo and share it with builders working on production AI.
rag-architect is a reusable operating system for a principal-level AI architect agent.
It helps an AI agent act less like a code generator and more like a senior technical leader who can:
- design production Retrieval Augmented Generation systems
- own Pinecone namespace strategy, chunking, embeddings, hybrid retrieval, and reranking decisions
- define golden datasets, retrieval metrics, generation scoring, latency benchmarks, and regression gates
- instrument AI systems for cost, token usage, quality signals, and anomaly detection
- turn architecture into high-quality GitHub issues that simpler coding agents can execute
- keep model routing and unit economics visible without sacrificing output quality
- document clear reasoning, tradeoffs, acceptance criteria, rollout plans, and rollback paths
The core idea: production AI architecture should be executable by implementation agents.
Use this if you are:
- building production RAG systems
- designing LLM agents connected to internal tools, APIs, or data sources
- managing multiple Pinecone namespaces or retrieval strategies
- trying to make AI features ship in weeks, not months
- creating eval frameworks for retrieval, generation, and agent workflows
- writing ADRs, architecture docs, or GitHub issues for AI engineering teams
- using Hermes Agent profiles and skills to specialize agent behavior
You can use the materials directly with Hermes Agent, or adapt the templates and checklists for Claude Code, Codex, OpenCode, Cursor, or your own agent workflows.
rag-architect/
├── SOUL.md
├── skills/
│ └── ai-architecture/
│ ├── principal-ai-architect/
│ │ ├── SKILL.md
│ │ ├── templates/adr.md
│ │ └── references/architecture-output-checklist.md
│ ├── production-rag-architecture/
│ │ ├── SKILL.md
│ │ ├── templates/rag-design-review.md
│ │ └── references/namespace-decision-matrix.md
│ ├── rag-evaluation-observability/
│ │ ├── SKILL.md
│ │ ├── templates/eval-plan.md
│ │ └── references/llm-judge-rubric.md
│ └── agentic-workflow-issue-factory/
│ ├── SKILL.md
│ ├── templates/implementation-issue.md
│ └── templates/agent-tool-design.md
└── scripts/validate.py
SOUL.md defines the persistent role identity and operating standard for the rag-architect profile. It tells the agent to optimize for:
- measurable business impact
- production reliability
- evaluation-driven iteration
- clean abstractions
- cost-aware model routing
- observability and anomaly detection
- documentation that simpler coding agents can execute
| Skill | Use it for |
|---|---|
principal-ai-architect |
principal-level architecture, ADRs, roadmap slicing, tradeoff analysis, business-aligned planning |
production-rag-architecture |
ingestion, chunking, embeddings, Pinecone namespaces, hybrid retrieval, reranking, context assembly |
rag-evaluation-observability |
golden datasets, retrieval/generation metrics, regression gates, LLM-as-judge rubrics, traces, latency/cost benchmarks |
agentic-workflow-issue-factory |
GitHub issues, implementation specs, acceptance criteria, rollout notes, agent tool design |
Hermes Agent supports profiles and profile-local skills. This repo is structured so you can copy it into a
rag-architectprofile.
git clone https://github.com/greynewell/rag-architect.git
cd rag-architectmkdir -p ~/.hermes/profiles/rag-architect
cp SOUL.md ~/.hermes/profiles/rag-architect/SOUL.md
mkdir -p ~/.hermes/profiles/rag-architect/skills
cp -R skills/* ~/.hermes/profiles/rag-architect/skills/hermes --profile rag-architectThen ask for work like:
Design a production RAG architecture for our internal knowledge base.
Turn this RAG roadmap into GitHub issues that coding agents can implement.
Review our Pinecone namespace strategy and propose eval gates.
Create an ADR for model routing and cost-tiering.
You can still use this repo as a standalone architecture toolkit:
- read
SOUL.mdas the operating charter - use
templates/adr.mdfor architecture decisions - use
templates/rag-design-review.mdfor RAG design reviews - use
templates/eval-plan.mdfor evaluation plans - use
templates/implementation-issue.mdfor GitHub issues - paste relevant
SKILL.mdfiles into your preferred coding agent as task context
- Pinecone multi-namespace strategy
- embedding model selection
- hybrid retrieval and reranking policy
- model routing and cost-tiering
- agent tool side-effect and permission policy
- Add Recall@k breakdown by namespace
- Add no-answer eval cases for hallucination resistance
- Add cost-per-request trace fields
- Add hybrid retrieval eval slice for acronym-heavy queries
- Add router logging for selected namespace and model tier
- golden dataset schema
- retrieval metrics: Recall@k, MRR, NDCG@k, context precision
- generation metrics: faithfulness, relevance, completeness, citation quality
- agent metrics: task completion, tool correctness, escalation behavior
- operational metrics: p50/p95/p99 latency, cost per successful answer
Production AI systems fail in ways demos do not reveal.
A useful AI architecture agent must therefore ask:
- What business workflow improves?
- What evidence proves quality improved?
- What happens when retrieval confidence is low?
- What is the cost per successful answer?
- What trace explains this exact model response?
- What regression gate prevents a silent quality drop?
- What issue can a coding agent implement in one to three sessions?
rag-architect encodes those questions into reusable skills, templates, and checklists.
Run the local validator before publishing changes:
python3 scripts/validate.pyIt checks:
- skill frontmatter
- required fields
- linked template/reference files
- markdown presence
- common secret-like strings
Recommended GitHub topics for discoverability:
rag
retrieval-augmented-generation
llm
ai-agents
agentic-workflows
ai-architecture
llmops
mlops
evaluation
observability
pinecone
hermes-agent
prompt-engineering
golden-datasets
model-routing
Contributions are welcome.
Good contributions include:
- sharper RAG design checklists
- better eval rubrics
- additional ADR templates
- more production observability fields
- examples from real agentic workflows
- improvements to Hermes Agent installation docs
- issue templates for AI engineering teams
Please keep contributions practical, implementation-ready, and grounded in production AI systems.
If rag-architect helps you design better production AI systems:
- ⭐ star the repo
- share it with RAG/LLM engineering teams
- use the templates in your next architecture review
- open an issue with your own production RAG lessons
MIT License. See LICENSE.