Recommended baseline for production:
- API service: 2-3 stateless instances
- MySQL: managed HA or primary-replica
- Redis: sentinel/cluster mode
- RabbitMQ: mirrored queues or managed MQ
- Vector storage: PostgreSQL + pgvector (dedicated)
- Observability: Prometheus + Loki + Tempo + Alertmanager
Mandatory:
OPENAI_API_KEYAPP_JWT_SECRET(32+ bytes)DB_URLDB_USERNAMEDB_PASSWORD
Strongly recommended:
APP_SECURITY_ENABLED=trueAPP_RATE_LIMIT_ENABLED=trueAPP_MODEL_ROUTER_ENABLED=trueAPP_MODEL_ROUTER_DEFAULT_PROFILE=balancedAPP_VECTOR_STORE_BACKEND=pgvectorAPP_REQUIRE_PGVECTOR=true
- Build image:
docker build -t knowledgeops-agent:<tag> .
- Apply DB migration (Flyway at startup or pipeline stage).
- Deploy canary instance.
- Verify:
/actuator/health/actuator/prometheus- key APIs (
/ai/chat,/ai/pdf/chat,/auth/token)
- Shift traffic gradually.
- Run post-deploy smoke + regression.
- Keep previous image tag warm.
- Roll back service image first.
- For schema changes, ensure backward-compatible migration before release.
- If queue backlog spikes, pause ingestion consumers and drain gradually.
- Chat API availability: >= 99.9%
/ai/chatp95 latency: <= 1500 ms- Ingestion failure ratio (5m): <= 5%
- MTTR for critical alerts: <= 30 min
- Secrets loaded from Vault/KMS/Secret Manager
- API Key issue/revoke flow verified
- JWT refresh flow verified
- Ingestion retry + DLQ verified
- Dashboard and alert routes verified
- Load test baseline recorded
- Backup and restore tested (MySQL + vector storage)