A production-grade prototype of a multi-agent AI system for handling end-to-end customer support for CloudDash — a cloud infrastructure monitoring SaaS. Built with Python, FastAPI, and a hybrid RAG pipeline.
- Overview
- Features
- Quick Start
- Project Structure
- Architecture Overview
- Knowledge Base
- API Reference
- Configuration
- Running Tests
- Environment Variables
- Design Decisions
- Known Limitations
CloudDash Support is a multi-agent customer support system that:
- Accepts customer queries via a REST API (or optional CLI)
- Classifies intent and routes to the correct specialist agent
- Grounds responses in a real knowledge base using a hybrid RAG pipeline
- Preserves full context during cross-agent handovers
- Escalates to human operators (simulated) when AI cannot resolve
- Enforces input/output guardrails for safety and accuracy
- Emits structured JSON logs with per-conversation trace IDs
| Category | Capability |
|---|---|
| Agents | Triage · Technical Support · Billing · Escalation |
| RAG | Dense vector search + BM25 keyword + RRF fusion + cross-encoder re-ranking |
| Knowledge Base | 20+ articles across FAQ, Troubleshooting, Billing, API Docs, Account |
| Handovers | Full context preservation, entity transfer, fallback routing, audit logging, aggregated responses |
| Guardrails | Prompt injection detection · PII redaction · Hallucination check |
| Observability | Structured JSON logs · trace_id propagation · optional Langfuse tracing |
| Config | YAML-driven agent definitions — add new agents without touching core code |
| Interface | REST API with auto-generated OpenAPI docs at /docs |
- Python 3.11+
- An OpenRouter API key (
OPEN_ROUTER_KEY) — free models are used by default (no cost)
git clone https://github.com/your-org/cloud-support.git
cd cloud-support
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env — set OPEN_ROUTER_KEY (get a free key at openrouter.ai)python -m retrieval.ingest
# This chunks, embeds, and indexes all KB articles into ChromaDB
# Persisted to ./chroma_db/ — only needs to run once (or after KB updates)uvicorn main:app --reload --port 8000The API is now live at http://localhost:8000.
Interactive docs: http://localhost:8000/docs
# Start a new conversation
curl -X POST http://localhost:8000/api/v1/conversations \
-H "Content-Type: application/json" \
-d '{"customer_id": "cust-001"}'
# Send a message (replace {id} with the conversation_id from above)
curl -X POST http://localhost:8000/api/v1/conversations/{id}/messages \
-H "Content-Type: application/json" \
-d '{"content": "My CloudDash alerts stopped firing after I updated my AWS credentials yesterday."}'python cli.py
# Interactive terminal chat sessioncloud-support/
│
├── main.py # FastAPI application entrypoint
├── cli.py # Interactive terminal chat session
├── requirements.txt
├── .env.example # Environment variable template
├── README.md
├── ARCHITECTURE.md # Detailed architecture document
│
├── agents/ # Agent implementations
│ ├── base_agent.py # Abstract BaseAgent class
│ ├── orchestrator.py # Central coordinator & state manager
│ ├── triage_agent.py # Intent classification & routing
│ ├── technical_agent.py # Technical support resolution
│ ├── billing_agent.py # Billing inquiries & plan management
│ └── escalation_agent.py # Human handover packaging
│
├── api/ # HTTP interface layer
│ ├── routes.py # FastAPI routers & endpoint definitions
│ └── dependencies.py # Dependency injection wiring
│
├── retrieval/ # RAG pipeline
│ ├── ingest.py # KB ingestion: parse → chunk → embed → index
│ ├── embeddings.py # Embedding model wrapper (OpenAI / local)
│ ├── vector_store.py # ChromaDB / FAISS abstraction
│ └── retriever.py # Query rewrite → dense+BM25 → RRF → rerank
│
├── handover/ # Agent handover protocol
│ ├── handover_manager.py # Payload building, dispatch, fallback
│ ├── summarizer.py # Conversation compression (<500 tokens)
│ └── audit_logger.py # JSONL audit log for every handover event
│
├── guardrails/ # Safety layer
│ ├── input_guard.py # Injection detection · off-topic filter
│ └── output_guard.py # PII redaction · hallucination check
│
├── models/ # Pydantic data models
│ ├── conversation.py # ConversationState
│ ├── message.py # Message, Citation
│ ├── agent_response.py # AgentResponse
│ └── handover.py # HandoverPayload, HandoverLog
│
├── config/ # Configuration files
│ ├── agents.yaml # Per-agent: class, prompt, LLM params
│ ├── routing.yaml # Intent → agent mapping & thresholds
│ └── settings.py # Pydantic Settings (reads .env)
│
├── knowledge_base/ # Knowledge base articles
│ ├── faq/ # General FAQ articles (JSON)
│ ├── troubleshooting/ # Step-by-step guides (JSON)
│ ├── billing/ # Pricing & policy docs (JSON)
│ ├── api_docs/ # API reference articles (JSON)
│ └── account/ # SSO, RBAC, team management (JSON)
│
├── utils/ # Shared utilities
│ ├── logger.py # Structured JSON logger + trace_id propagation
│ └── llm_client.py # OpenAI wrapper with retry & timeout
│
└── tests/ # Test suite
├── unit/
│ ├── test_triage.py
│ ├── test_retriever.py
│ ├── test_guardrails.py
│ └── test_handover.py
└── integration/
├── test_single_agent_flow.py
├── test_cross_agent_handover.py
└── test_escalation_flow.py
Client
│
▼
FastAPI (api/)
│ Input Guardrail → reject injection / off-topic
▼
Orchestrator (agents/orchestrator.py)
│ Loads ConversationState · routes · dispatches
├──► Triage Agent → classify intent, extract entities
├──► Technical Agent → RAG + LLM troubleshooting
├──► Billing Agent → RAG + mock account lookup
└──► Escalation Agent → summarize + package for human
│
▼ (when cross-domain)
HandoverManager (handover/)
│ build payload · validate · audit log
▼
Target Agent (resumes with full context)
│
▼
Output Guardrail → PII redact · hallucination check
│
▼
Structured JSON Response (with citations)
See ARCHITECTURE.md for the full component breakdown, data models, RAG pipeline diagram, and handover state machine.
The KB contains 20 articles across five categories:
| Category | Count | Examples |
|---|---|---|
| FAQ | 5 | API key reset, supported cloud providers, team invitations |
| Troubleshooting | 6 | Alerts not firing, AWS integration failure, dashboard latency |
| Billing & Pricing | 4 | Plan comparison, refund policy, invoice explanation, payment methods |
| API Docs | 3 | Authentication, rate limits, webhook setup |
| Account & Access | 2 | SSO / SAML setup, RBAC configuration |
Each article follows this schema:
{
"id": "KB-001",
"title": "How to Configure Alert Thresholds",
"category": "troubleshooting",
"tags": ["alerts", "configuration", "thresholds"],
"content": "...",
"last_updated": "2026-04-15",
"applies_to": ["Pro", "Enterprise"]
}To add new KB articles, place them in the appropriate knowledge_base/<category>/ directory (as JSON files) and re-run python -m retrieval.ingest.
Full interactive docs available at http://localhost:8000/docs (Swagger UI).
POST /api/v1/conversations
Content-Type: application/json
{ "customer_id": "cust-001" }Response:
{
"conversation_id": "conv-abc123",
"trace_id": "tr-xyz789",
"current_agent": "triage",
"created_at": "2026-05-14T10:30:00Z"
}POST /api/v1/conversations/{conversation_id}/messages
Content-Type: application/json
{ "content": "My alerts stopped firing after updating AWS credentials." }Response:
{
"agent": "technical",
"content": "This is typically caused by IAM permission changes... [KB-007]",
"citations": [
{
"kb_id": "KB-007",
"title": "AWS Integration Troubleshooting",
"snippet": "When credentials are rotated, the IAM role must include...",
"score": 0.91
}
],
"handover_occurred": false,
"trace_id": "tr-xyz789"
}GET /api/v1/conversations/{conversation_id}/historyGET /api/v1/healthDefine agents declaratively — no code changes needed to add new ones:
agents:
triage:
class: agents.triage_agent.TriageAgent
system_prompt_file: prompts/triage.txt
max_tokens: 512
temperature: 0.2
retriever_enabled: false
technical:
class: agents.technical_agent.TechnicalAgent
system_prompt_file: prompts/technical.txt
max_tokens: 1024
temperature: 0.3
retriever_enabled: true
retrieval_top_k: 5routing:
intents:
technical_issue:
target: technical
confidence_threshold: 0.70
billing_inquiry:
target: billing
confidence_threshold: 0.75
fallback:
low_confidence: triage
agent_error: escalation
max_handovers: 3# All tests
pytest tests/ -v
# Unit tests only
pytest tests/unit/ -v
# Integration tests (requires OPENAI_API_KEY)
pytest tests/integration/ -v
# With coverage report
pytest tests/ --cov=. --cov-report=html| Scenario | Test File |
|---|---|
| Triage intent classification (all 4 scenarios) | unit/test_triage.py |
| KB retrieval returns cited chunks | unit/test_retriever.py |
| Input guardrail — injection detection | unit/test_guardrails.py |
| Output guardrail — PII redaction | unit/test_guardrails.py |
| HandoverManager payload + audit log | test_handover.py |
| Scenario 1 — single-agent technical flow | integration/test_scenario_1.py |
| Scenario 2 — cross-agent handover Tech→Billing | integration/test_scenario_2.py |
| Scenario 3 — billing dispute → escalation | integration/test_scenario_3.py |
Copy .env.example to .env and fill in the values:
# ── API Keys ──────────────────────────────────────────────────
OPEN_ROUTER_KEY=your_openrouter_key # Required — get free key at openrouter.ai
OPENAI_API_KEY= # Optional — fallback if OPEN_ROUTER_KEY not set
# ── Model Configuration ───────────────────────────────────────
OPENAI_MODEL=nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
EMBEDDING_MODEL=nvidia/llama-nemotron-embed-vl-1b-v2:free
# ── Vector Store ──────────────────────────────────────────────
CHROMA_PERSIST_DIRECTORY=./chroma_db
# ── Server ────────────────────────────────────────────────────
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=INFOLangGraph and CrewAI are powerful, but they introduce significant framework complexity and vendor lock-in. For this prototype, a custom async Orchestrator gives us:
- Full control over routing logic and state management
- No hidden graph execution semantics to debug
- A cleaner separation of concerns across modules
- Easier to explain in a live discussion
The trade-off: we write more boilerplate. For a production system with 10+ agents, adopting LangGraph's stateful graph model would be the right call.
ChromaDB runs locally with zero infrastructure — no account needed, no cost, persists to disk. The VectorStore abstraction in retrieval/vector_store.py makes swapping to Pinecone a single config change for production.
Pure vector search misses exact-match queries for technical terms like "SSO", "SAML", "BM25", "webhook". BM25 excels at these. RRF fusion captures the best of both, and cross-encoder re-ranking (using cross-encoder/ms-marco-MiniLM-L-6) significantly improves precision on the final top-k.
Instead of a simple "push" handover where the user is just transferred, the Orchestrator aggregates responses from multiple agents in a single turn. For example, if a user asks to fix a technical issue and then upgrade their plan, the Orchestrator:
- Invokes the Technical Agent to provide the fix.
- Detects the need for a Billing handover.
- Invokes the Billing Agent to handle the upgrade.
- Aggregates both responses into a single, cohesive message for the user.
This ensures a higher quality of service and reduces the "ping-pong" effect for the customer.
- In-memory state is lost on restart unless
REDIS_URLis set. - No streaming — responses are complete JSON payloads; SSE would reduce perceived latency.
- Billing is simulated — mock
AccountLookupuses fixture data; no real CRM integration. - Single-instance only — horizontal scaling requires Redis-backed session state.
- Hallucination check is overlap-heuristic-based; a dedicated NLI model (e.g.,
roberta-large-mnli) would be more reliable. - No authentication on the API — suitable for prototype demo; would need JWT/API-key auth for production.
🌐 Deployed at: https://cloud-support.your-domain.com
📖 Swagger UI: https://cloud-support.your-domain.com/docs
Deployed on Render free tier. Cold starts may take ~30s.
Assessment submission for CloudDash Multi-Agent Support — AI Engineering Intern Role.