Deploy an Azure SRE Agent with Terraform. The agent automates site reliability engineering tasks — incident triage, diagnostics, remediation, and scheduled operational workflows — powered by AI.
| Tool | Version | Install |
|---|---|---|
| Terraform | >= 1.5.0 | Install |
| Azure CLI | >= 2.60 | Install |
| yq | any | brew install yq |
| You also need: |
- An active Azure subscription
- Owner or User Access Administrator role on the subscription
- The
Microsoft.Appresource provider registered (the script handles this automatically)
# Sign in to Azure
az login
# Review the plan
./scripts/manage-environment.sh plan
# Deploy all resources + agent configuration
./scripts/manage-environment.sh up
# Deploy agent configuration only (connectors, skills, subagents)
./scripts/manage-environment.sh configure
# Tear down everything
./scripts/manage-environment.sh downAfter deployment, the script prints the agent portal URL — open it to start using your SRE Agent at sre.azure.com.
Created in a single resource group:
| Resource | Purpose |
|---|---|
| Azure SRE Agent | The AI agent (Microsoft.App/agents) |
| Log Analytics Workspace | Backing store for Application Insights and KQL queries |
| Application Insights | Agent telemetry, action logging, performance monitoring |
| Smart Detection Alert Rule | Automatic failure anomaly detection |
| User-Assigned Managed Identity | Agent authentication to Azure (no secrets to manage) |
| RBAC Role Assignments | Reader, Log Analytics Reader, Monitoring Reader/Contributor on the resource group; Monitoring Contributor on the subscription; SRE Agent Administrator for the deployer |
Deployed via configure-agent.sh after infrastructure is up:
| Component | Source | API | Description |
|---|---|---|---|
| Connectors | configuration/connectors/*.yaml |
ARM (az rest) |
Datadog MCP (19 tools), Coralogix MCP (17 tools) |
| Skills | configuration/skills/*.md |
Data plane | Reusable skill definitions with tool lists and instructions |
| Subagents | configuration/agents/*.yaml |
Data plane | Custom AI agents routed to skills via allowed_skills |
Connector YAML files use ${VAR} placeholders substituted from .env at deploy time:
cp .env.example .env # copy the template
vim .env # fill in your API keysRequired variables:
| Variable | Description |
|---|---|
DD_MCP_URL |
Datadog MCP server endpoint |
DD_API_KEY |
Datadog API key |
DD_APPLICATION_KEY |
Datadog application key |
CX_MCP_URL |
Coralogix MCP server endpoint |
CX_API_KEY |
Coralogix API key |
.env is git-ignored. Never commit secrets.
Copy the example tfvars file and customise:
cp infrastructure/terraform.tfvars.example infrastructure/terraform.tfvarsagent_name = "sre-agent-dev" # Name of the SRE Agent
resource_group_name = "rg-sre-agent-dev" # Resource group name
location = "swedencentral" # Azure region (eastus2 | swedencentral | australiaeast)
access_level = "High" # High = Reader + Contributor, Low = Reader only
agent_mode = "Review" # Review | Autonomous | ReadOnly
log_retention_days = 30 # Log Analytics retention (30-730)terraform.tfvars is git-ignored. Commit changes via terraform.tfvars.example instead.
Azure SRE Agent is available in three regions only:
| Region | Code |
|---|---|
| East US 2 | eastus2 |
| Sweden Central | swedencentral |
| Australia East | australiaeast |
sre-agent/
├── README.md # This file
├── AGENT.md # Internal project tracker, architecture decisions, changelog
├── .gitignore
├── infrastructure/ # Stage 1: Terraform
│ ├── providers.tf # azurerm ~>4.0, azapi ~>2.0
│ ├── variables.tf # Input variables with validation
│ ├── main.tf # Root module calling child modules
│ ├── outputs.tf # Outputs: agent ID, portal URL, endpoint, UAMI, LAW
│ ├── terraform.tfvars.example # Template for terraform.tfvars (git-ignored)
│ └── modules/
│ ├── resource-group/ # azurerm_resource_group
│ ├── monitoring/ # LAW + App Insights + Smart Detection alert
│ ├── identity/ # User-Assigned Managed Identity
│ ├── sre-agent/ # azapi_resource (Microsoft.App/agents)
│ └── role-assignments/ # RBAC roles for the UAMI and deployer
├── configuration/ # Stage 2: YAML/MD → REST API
│ ├── agents/ # Subagent .yaml (api_version, kind, spec)
│ ├── connectors/ # Connector .yaml (ARM sub-resources)
│ └── skills/ # Skill .md (frontmatter + skill content)
└── scripts/
├── manage-environment.sh # up / down / plan / configure
└── configure-agent.sh # Deploys config via SRE Agent REST API
Primary script for managing the environment locally.
Usage: manage-environment.sh <command> [options]
Commands:
up Initialize, plan, apply Terraform, then deploy agent configuration
down Destroy all resources
plan Initialize and show Terraform plan
configure Deploy agent configuration only (connectors, skills, subagents)
Options:
-v, --var-file FILE Use a custom .tfvars file (default: terraform.tfvars)
-h, --help Show help
Deploys agent configuration (connectors via ARM, skills and subagents via data plane REST API). Called automatically by manage-environment.sh up, or run standalone:
# Deploy all configuration
./scripts/configure-agent.sh
# With explicit endpoint
./scripts/configure-agent.sh --endpoint https://your-agent.azuresre.dev
# With custom config directory
./scripts/configure-agent.sh --config /path/to/configAll agent configuration lives in configuration/ as code. The script deploys them in dependency order: connectors → skills → subagents.
YAML files following the AgentConfiguration schema. The system_prompt field contains the agent's instructions.
api_version: azuresre.ai/v1
kind: AgentConfiguration
spec:
name: datadog-coralogix-agent
system_prompt: >-
You are an Observability Investigation Agent...
tools: []
handoffs: []
handoff_description: ""
agent_type: Review
enable_skills: true
allowed_skills:
- sre-observability-investigation-skillMarkdown files with YAML frontmatter. The tools array lists MCP tools the skill can use. The body is the skill content.
---
name: sre-observability-investigation-skill
description: Investigates production issues using Coralogix and Datadog.
tools:
- datadog-mcp_analyze_datadog_logs
- coralogix-mcp_get_logs
---
# Skill Behavior
...- No native
azurermresource exists for Azure SRE Agent. The agent is created via theazapi_resourceprovider using the ARM typeMicrosoft.App/agents@2025-05-01-preview. Schema sourced from official Microsoft Bicep samples. - Schema validation is disabled on the
azapi_resourcebecause the provider doesn't include this preview API schema. - Agent configuration uses two APIs. Connectors are ARM sub-resources deployed via
az rest. Subagents and skills use the SRE Agent data plane REST API. This follows Microsoft's own pattern (infra via IaC, config via post-provision script). - Both connectors are MCP type. Datadog uses
partnerType: DatadogMcp(appears under Telemetry in portal). Coralogix uses a generic MCP connector with bearer token auth. - Terraform state is local by default.
- Model provider defaults to Azure OpenAI. Anthropic (Claude) is an alternative but excluded from EU Data Boundary commitments.
| Stage | Description | Status |
|---|---|---|
| 1. Infrastructure | Terraform: SRE Agent + RG, LAW, App Insights, UAMI, RBAC | Done |
| 2. Agent Configuration | Connectors, skills, subagents | Done |
