Contribution · Application — Developer & DevOps

AI for Automated Runbook Execution in DevOps

On-call engineers at 3am run the same 10 runbooks most weeks: restart a pod, rotate a secret, drain a node, clear a queue. Agentic LLMs with tool-use can execute these runbooks under supervision — dry-running the destructive commands, diffing state before and after, and surfacing a human-readable summary. The winning pattern is declarative: runbooks as YAML, the agent as the executor with hard approval gates on high-blast-radius actions.

Application facts

Domain
Developer & DevOps
Subdomain
Incident response
Example stack
Claude Sonnet 4.7 as the runbook-executor agent · Runbooks in YAML or CUE with typed parameters · kubectl + Terraform + Ansible via MCP tool servers · Temporal.io for durable workflow orchestration · Sigstore for signed action provenance

Data & infrastructure needs

  • Declarative runbook library (versioned, reviewed)
  • Service topology and dependency graph
  • On-call rotation and escalation policies
  • Incident history for learning loops

Risks & considerations

  • Prompt injection from log fields influencing tool calls
  • Blast-radius escalation — unintended mass-scale changes
  • Drift between documented runbook and actual system state
  • SOC 2 / ISO 27001 audit failures if action provenance is weak

Frequently asked questions

Is it safe to let an LLM execute runbooks?

Only with strict guardrails: typed parameters, dry-run for destructive actions, human approval for irreversible operations, signed action logs, and a kill-switch. The LLM should choose among approved runbooks, not invent commands.

What model is best for agentic runbook execution?

Claude Sonnet 4.7 leads on reliable tool use and instruction following for operational workflows as of April 2026. GPT-5 is competitive. Avoid smaller models for production systems; reasoning quality is decisive when a wrong action costs customer impact.

Regulatory considerations for automated runbooks?

SOC 2, ISO 27001, and ISAE 3402 for audit trails. RBI IT Framework and SEBI CSCRF for BFSI. HIPAA for PHI-touching systems. EU AI Act may classify high-blast-radius automation as high-risk for critical infrastructure.

Sources

  1. NIST SP 800-61r3 incident handling — accessed 2026-04-20
  2. Sigstore project — accessed 2026-04-20