Creativity

Agent-to-Agent Protocols

How agents find each other, negotiate tasks, and share context — the emerging stack for multi-agent systems.

102 entries · Sorted A→Z

A2A Authentication — OAuth & Beyond

A2A leans on standard web auth — primarily OAuth 2.0 bearer tokens — so agents authenticate to one another the same way services do, with API keys and mTLS as alternatives.

A2A Task Handoff — Semantics & Lifecycle

Task handoff in A2A describes the full lifecycle of delegating work to another agent: create, assign, run, stream updates, return result — with support for long-running and multi-turn tasks.

Adept ACT-1 — Action Transformer

Adept's ACT-1 was a 2022 action-transformer model that pioneered browser-controlling foundation models — a major intellectual precursor to modern computer-use agents.

AG-UI — Agent-User Interaction Protocol

AG-UI is an open event-based protocol for how a running agent streams its thoughts, tool calls, and partial outputs to a user-facing UI — the agent-to-UI counterpart of A2A.

Agent Cache-and-Memoize Pattern

Caching tool-call results and memoizing identical LLM prompts is how production agents cut cost and latency by 50–90% — turning repeated external calls into instant local lookups.

Agent Cost and Token Budget Patterns

Agents can burn thousands of dollars in a single run if left unchecked — explicit token and cost budgets, per-step guards, context pruning, and cheaper-model routing are the patterns production teams use to keep spend sane.

Agent Credential Vault Pattern

The credential-vault pattern stores secrets — API keys, OAuth tokens, passwords — outside the agent's memory and injects them only into specific tool calls, limiting blast radius if the agent is compromised.

Agent Episodic Memory Pattern

Episodic memory stores specific past events — 'on March 3, user asked X, agent did Y' — letting an agent recall concrete past interactions rather than only general facts.

Agent Human-in-the-Loop (HITL) Pattern

Human-in-the-loop is the design pattern where agents pause for human approval, correction, or input at specific checkpoints — trading some autonomy for safety, accuracy, and regulatory fit in high-stakes workflows.

Agent Identity — OIDC and OAuth 2.1

Agent identity uses OIDC and OAuth 2.1 to give AI agents their own cryptographically-verifiable identities — separate from user identities — with scoped permissions and full audit trails.

Agent Map-Reduce Pattern

Map-reduce for agents: split a large input into chunks, process each in parallel with a 'map' agent, then combine results with a 'reduce' agent — the classic recipe for long-document work.

Agent Mesh Networking Pattern

Agent mesh networking is an architecture where specialized agents discover each other via a registry, call each other directly over a standard protocol (A2A, MCP), and compose dynamically without central orchestration.

Agent Network Protocol (ANP)

ANP is an open agent-to-agent protocol that treats agents as first-class peers on a decentralized network — using DIDs for identity and JSON-LD for capability discovery.

Agent PII Redaction Layer

A PII redaction layer sits between an agent and its inputs/outputs, scrubbing personally-identifiable information — names, SSNs, card numbers — before it reaches the LLM or leaves the system.

Agent Pipeline Pattern

The pipeline pattern chains agents in a fixed sequence, each transforming the previous agent's output — a Unix-pipe style composition that favors determinism over autonomy.

Agent Procedural Memory Pattern

Procedural memory stores learned how-to knowledge — reusable skill snippets, successful tool-call sequences, corrected mistakes — that the agent can retrieve and apply to future similar tasks.

Agent Prompt-Injection Defense

Prompt-injection defense is a layered set of techniques — input sanitization, instruction hierarchies, capability scoping, output firewalls — used to prevent attackers from hijacking an agent via untrusted text.

Agent Rate Limiting and Quotas

Rate limiting and quotas bound an agent's cost, blast radius, and abuse potential by capping tool calls, token spend, and external API use per user, session, or time window.

Agent Retry-with-Backoff Pattern

Retry-with-backoff is the core resilience pattern for agent tool calls: on transient failure, wait an exponentially growing interval before retrying, with jitter to avoid thundering-herd retries.

Agent Router / Classifier Pattern

The router pattern puts a lightweight classifier at the front door of an agent system, dispatching each request to the cheapest model or most specialized sub-agent that can handle it.

Agent Sandboxing and Safety Patterns

Sandboxing is the foundational safety pattern for agents that run code or browse the web — isolating the agent's execution environment so compromised or hallucinating runs cannot damage host systems or exfiltrate data.

Agent Self-Critique Pattern

Self-critique is an agent design pattern where the agent reviews and scores its own draft output against a rubric or checklist before returning it, catching errors that slipped past the initial generation.

Agent Semantic Memory Pattern

Semantic memory stores generalized facts — 'the user prefers Python', 'our prod DB is Postgres' — as structured knowledge the agent can retrieve and use in future interactions.

Agent State and Checkpointing

Production agents need durable state and checkpoints — snapshots of memory, tool outputs, and plan steps — so long-running tasks survive crashes, timeouts, and human interruptions without starting over.

Agent Streaming / Partial Results Pattern

The streaming pattern surfaces partial agent output — token-by-token text, interim tool results, status events — to the user as it happens, making multi-second agent tasks feel responsive.

Agent Tool Permissioning Patterns

Tool permissioning is the discipline of granting agents the narrowest possible capability set — per-tool allow-lists, confirmation prompts for destructive operations, scoped OAuth, and user-in-the-loop approvals.

Agent Voting / Ensemble Pattern

The voting-ensemble pattern runs N agents in parallel on the same task and aggregates their answers by majority vote or a judge model, trading cost for robustness on high-stakes decisions.

AgentBench: Multi-Environment LLM Agent Benchmark

AgentBench from Tsinghua evaluates LLMs as agents across eight distinct environments — OS, database, web shopping, games, and more — producing a single comparable score for agentic capability.

AgentOps: Observability Platform for AI Agents

AgentOps is an open-source observability platform for LLM agents that captures every tool call, token, cost, and latency span — giving production teams tracing, session replay, and evals.

Agents ↔ MCP Interoperability

MCP (Model Context Protocol) has become the de-facto standard for exposing tools and data to agents — this entry covers how agent frameworks interoperate with MCP servers in practice.

AI Engineer Foundation Agent Protocol (aka Arcadia)

The AI Engineer Foundation Agent Protocol is an open, vendor-neutral REST specification for running and controlling an agent — start a task, stream steps, list artifacts — backed by an open-source reference server.

Anthropic Computer Use Agent

Computer Use is Anthropic's API capability that lets Claude see the screen, move the mouse, and type — enabling the model to operate general-purpose software GUIs like a human user.

Arize Phoenix for Agent Tracing and Evals

Arize Phoenix is an open-source LLM observability tool that traces agent runs via OpenTelemetry, clusters failures by embedding, and runs LLM-as-judge evals — all locally or self-hosted.

AutoGen GroupChat

AutoGen's GroupChat puts several specialist agents around a virtual table with a manager that picks the next speaker — a flexible many-agent conversation primitive from Microsoft Research.

AutoGPT (Original 2023)

AutoGPT, released March 2023, was the first viral autonomous agent framework — a Python script that chained GPT-4 calls with tools to pursue goals without human steps, sparking the agent-framework era.

BabyAGI (Original 2023)

BabyAGI, released April 2023 by Yohei Nakajima, was a ~140-line Python script demonstrating task decomposition + prioritization + execution with GPT-4 — one of the first autonomous agent patterns shared widely.

Blackboard Pattern for Multi-Agent Systems

The blackboard pattern uses a shared workspace where agents read and write partial results — a classical AI architecture now finding new life in LLM-agent systems.

Bolt.new: In-Browser Full-Stack Coding Agent

Bolt.new by StackBlitz is a browser-based full-stack coding agent built on WebContainers — it runs Node.js, installs packages, edits files, and previews apps entirely in the browser, then deploys to Netlify.

Browser Use: Open-Source LLM Browser Agent Framework

Browser Use is an open-source Python library that gives LLM agents structured access to a real Playwright browser — they see the DOM, screenshots, and interactive elements, and act via a typed action space.

Claude Code Subagents

Claude Code's subagent pattern lets the main Claude agent spawn specialised sub-Claudes with their own prompts, tool allowlists, and contexts — a first-class multi-agent workflow in a coding CLI.

Claude Subagents in Production

Claude subagents have moved from coding-CLI curiosity to production pattern — powering Anthropic's own research agent and an increasing share of real-world agent deployments.

Cognition Devin: Autonomous Software Engineer Agent

Devin is Cognition's autonomous software-engineer agent that plans long-horizon coding tasks, browses documentation, executes shell commands, and ships pull requests — the prototype of the fully-autonomous SWE agent category.

CrewAI Hierarchical Process

CrewAI's hierarchical process puts a manager agent in charge of a crew — assigning tasks, reviewing outputs, and iterating — contrasting with its simpler sequential process.

Cursor Composer: Multi-File Agentic Editor

Cursor Composer (Agent mode) is the multi-file, multi-step coding agent inside the Cursor IDE — it plans edits across files, runs shell commands, and iterates on tests without leaving the editor.

Deep Research Agent Pattern

Deep research is a now-standard agent pattern — a lead agent plans a research question, dispatches parallel sub-agents to explore, synthesises findings, and cites sources.

Enterprise DevOps / SRE Agent

A DevOps/SRE agent triages alerts, investigates incidents, proposes (or executes) fixes, and writes postmortems — augmenting on-call engineers with always-on log/metric correlation.

Enterprise Finance Analyst Agent

A finance analyst agent pulls data from ERP, data warehouses, and market sources, builds models, and drafts variance and scenario analyses — augmenting FP&A and investment teams.

Enterprise HR / Recruiting Agent

A recruiting agent sources candidates, screens resumes, drafts outreach, schedules interviews, and summarizes feedback — managing the top of the hiring funnel with bias auditing built in.

Enterprise Legal Research Agent

A legal research agent searches case law, statutes, and firm documents, drafts memoranda with citations, and flags relevant precedents — augmenting associates on research-heavy workflows.

Enterprise Marketing Campaign Agent

A marketing campaign agent plans campaigns, drafts creative across channels, segments audiences, launches in ad platforms, and reports on performance — closing the loop on optimization.

Enterprise Sales Agent (SDR)

An enterprise SDR agent autonomously researches accounts, drafts personalized outreach, books meetings, and updates the CRM — replacing or augmenting the first-line sales-development role.

Enterprise Support Agent (Tier 1)

A Tier-1 support agent autonomously resolves the bulk of inbound customer issues — password resets, billing questions, order status, how-to queries — and cleanly escalates the rest to humans.

GAIA Benchmark for General AI Assistants

GAIA is a benchmark from Hugging Face and Meta that tests general AI assistants on real-world, multi-step questions requiring reasoning, tool use, and web browsing — designed to be easy for humans and hard for current agents.

Glean — Enterprise Work Agent

Glean is a work-assistant platform that indexes a company's SaaS stack — Google Drive, Slack, Jira, Notion, Salesforce — and provides search and agents grounded in that internal knowledge.

Google A2A (Agent-to-Agent) Protocol

Google's A2A is an open protocol for agent interoperability — how independently-built agents discover each other, describe their capabilities, and exchange task state.

GPT Researcher: Autonomous Research Agent

GPT Researcher is an open-source autonomous research agent that drafts a plan, issues web queries across many sources, deduplicates, and writes a cited research report — all without a human in the loop.

HaluEval — Hallucination Evaluation Benchmark

HaluEval is a large-scale benchmark of hallucination examples across QA, dialogue, and summarization used to measure how often LLM agents invent facts versus ground them in retrieved sources.

Hierarchical Agent Pattern

The hierarchical pattern stacks orchestrator-worker vertically: a top-level planner delegates to mid-level coordinators, who in turn delegate to leaf worker agents — structured delegation for complex tasks.

IBM Agent Communication Protocol (ACP)

IBM's ACP is an open protocol for agent-to-agent messaging, discovery, and orchestration — developed under the BeeAI project and designed for enterprise-grade multi-agent systems.

Jules — Google's Asynchronous Coding Agent

Jules is Google's asynchronous coding agent, built on Gemini — it clones your repo, plans changes, runs in a cloud VM, and opens a pull request with tests and diffs for review.

LangGraph Supervisor Pattern

LangGraph's supervisor pattern uses a top-level supervisor agent that routes messages to specialised worker agents in a graph — the idiomatic LangGraph way to build multi-agent systems.

LongBench — Long-Horizon Agent Benchmark

LongBench evaluates agents on tasks that span many steps, long documents, and extended time horizons — where short-horizon benchmarks fail to capture the real difficulty of agent work.

Lovable: Chat-to-App Full-Stack Agent

Lovable is a chat-driven full-stack app builder that generates React + Tailwind frontends wired to Supabase backends — turning a natural-language brief into a working, deployable SaaS product.

MAgent / MAgentBench — Multi-Agent Benchmark

MAgent and its successors benchmark multi-agent systems on cooperative and competitive tasks — negotiation, resource allocation, team coding — where the failure mode is coordination, not individual agent skill.

Manus Agent Platform

Manus is a general-purpose agent platform from Monica that gained attention in early 2025 for running long, autonomous browser + compute workflows on behalf of users.

mem0 — Agent Memory Layer

mem0 is an open-source memory layer for AI agents that extracts, deduplicates, and retrieves user- and session-scoped facts across multi-turn conversations with a simple SDK.

Multi-Agent Debate Pattern

In the debate pattern, two or more agents argue different positions on a problem before a judge agent adjudicates — a technique shown to improve reasoning accuracy on hard problems.

MultiOn: Consumer Web-Action Agent

MultiOn is a consumer-facing web-action agent that turns natural-language goals into real browser actions — booking tables, filling forms, placing orders — across any public site.

OpenAI Agents Protocol and Agents SDK

OpenAI's Agents SDK and the underlying Responses API form an emerging de-facto agents protocol — typed tool calls, handoffs, tracing, and guardrails with portable concepts across providers.

OpenAI Evals for Agent Workflows

OpenAI's Evals framework and hosted Evals API let teams define graders, run LLM-as-judge and programmatic evaluations, and track agent quality across prompt, model, and tool changes.

OpenAI Swarm Framework

OpenAI Swarm was an educational multi-agent framework focused on lightweight, stateless, peer-to-peer handoffs — the conceptual precursor to the production OpenAI Agents SDK.

Orchestrator-Worker Pattern

The orchestrator-worker pattern assigns a lead agent to plan and route work, while specialised worker agents execute individual steps — the workhorse pattern for most production agent systems.

OSWorld: Real Operating System Agent Benchmark

OSWorld is a scalable benchmark that evaluates multi-modal agents on real computer tasks across Ubuntu, Windows, and macOS environments — clicking, typing, and navigating GUIs like a human user.

Perplexity Deep Research

Perplexity Deep Research is an autonomous multi-step research agent that browses the web for several minutes, synthesizes dozens of sources, and writes a cited long-form report for a single prompt.

Playwright for AI Agents

Playwright is Microsoft's cross-browser automation library — Chromium, Firefox, WebKit — widely used as the deterministic foundation underneath AI-powered browser agents like Stagehand and browser-use.

Reflection Agent Pattern

Reflection is the pattern where an agent critiques its own output — or has a reviewer agent critique it — before finalising, catching errors that a single forward pass would emit.

Rod — Go Browser Automation for Agents

Rod is a Go-native Chrome DevTools Protocol library that provides high-performance browser automation without Node dependencies — popular for Go agent backends driving browsers at scale.

SafeBench — Agent Safety Benchmark

SafeBench is a benchmark suite that stress-tests autonomous agents on harmful-instruction compliance, indirect prompt injection, unsafe tool use, and jailbreak robustness across standardized scenarios.

Sakana AI Scientist: Fully Automated Research Pipeline

AI Scientist by Sakana AI is an end-to-end agent pipeline that proposes ML research ideas, writes experiment code, runs experiments, analyzes results, and drafts a LaTeX paper — the first demonstration of fully autonomous ML research.

Selenium for AI Agents

Selenium is the veteran cross-browser automation framework — WebDriver-based, language-agnostic — still used by AI agents operating in enterprise or legacy environments where Playwright isn't an option.

Skyvern: LLM-Driven Browser RPA Agent

Skyvern is an open-source RPA platform that uses LLMs and vision models to automate browser workflows — form fills, portal logins, document uploads — without writing brittle XPath selectors.

Stagehand — AI Browser Agent Framework

Stagehand is an open-source browser automation framework from Browserbase that combines deterministic Playwright code with AI-powered steps like act(), extract(), and observe() for resilient web agents.

SWE-bench for Agents: Evaluating Coding Agents

SWE-bench evaluates autonomous coding agents on real GitHub issues from popular Python projects — the agent must produce a patch that resolves the issue and passes the project's own tests.

tau-bench — Tool-Augmented Agent Benchmark

tau-bench is Sierra's benchmark for conversational agents that must use tools to complete real customer-support tasks like airline rebooking and retail returns, scored on policy compliance and task completion.

v0 by Vercel: UI-First Generative Agent

v0 by Vercel is a generative UI agent specialized in React, Next.js, Tailwind, and shadcn/ui — turning natural-language prompts and screenshots into production-ready components and deployable apps.

WebArena: Realistic Web-Agent Benchmark

WebArena is a reproducible, self-hosted benchmark from Carnegie Mellon featuring four fully-functional websites — e-commerce, forums, Gitea, content management — where agents must complete natural-language tasks end-to-end.

Writer — Enterprise Agent Platform

Writer is a full-stack generative AI platform for enterprises, combining its own Palmyra LLM family with an agent builder, knowledge graph, and strict brand-voice and compliance controls.

Zep — Agent Memory Platform

Zep is a memory platform for AI agents that combines a temporal knowledge graph (Graphiti) with vector search to give agents persistent, queryable memory with fact-level provenance.