Contribution · Application — Developer & DevOps
AI for Automated Runbook Execution in DevOps
On-call engineers at 3am run the same 10 runbooks most weeks: restart a pod, rotate a secret, drain a node, clear a queue. Agentic LLMs with tool-use can execute these runbooks under supervision — dry-running the destructive commands, diffing state before and after, and surfacing a human-readable summary. The winning pattern is declarative: runbooks as YAML, the agent as the executor with hard approval gates on high-blast-radius actions.
Application facts
- Domain
- Developer & DevOps
- Subdomain
- Incident response
- Example stack
- Claude Sonnet 4.7 as the runbook-executor agent · Runbooks in YAML or CUE with typed parameters · kubectl + Terraform + Ansible via MCP tool servers · Temporal.io for durable workflow orchestration · Sigstore for signed action provenance
Data & infrastructure needs
- Declarative runbook library (versioned, reviewed)
- Service topology and dependency graph
- On-call rotation and escalation policies
- Incident history for learning loops
Risks & considerations
- Prompt injection from log fields influencing tool calls
- Blast-radius escalation — unintended mass-scale changes
- Drift between documented runbook and actual system state
- SOC 2 / ISO 27001 audit failures if action provenance is weak
Frequently asked questions
Is it safe to let an LLM execute runbooks?
Only with strict guardrails: typed parameters, dry-run for destructive actions, human approval for irreversible operations, signed action logs, and a kill-switch. The LLM should choose among approved runbooks, not invent commands.
What model is best for agentic runbook execution?
Claude Sonnet 4.7 leads on reliable tool use and instruction following for operational workflows as of April 2026. GPT-5 is competitive. Avoid smaller models for production systems; reasoning quality is decisive when a wrong action costs customer impact.
Regulatory considerations for automated runbooks?
SOC 2, ISO 27001, and ISAE 3402 for audit trails. RBI IT Framework and SEBI CSCRF for BFSI. HIPAA for PHI-touching systems. EU AI Act may classify high-blast-radius automation as high-risk for critical infrastructure.
Sources
- NIST SP 800-61r3 incident handling — accessed 2026-04-20
- Sigstore project — accessed 2026-04-20