Creativity · Agent Protocol

Agent Cache-and-Memoize Pattern

An agent that answers 'what's the weather in Delhi' by calling the weather API every single time is burning money. The cache-and-memoize pattern keys tool results by their arguments and returns a cached value within a TTL, while LLM prompt caching (Anthropic, OpenAI) lets identical system prompts skip recomputation. In production, well-tuned caches drop cost 50–90% on high-volume agents.

Protocol facts

Sponsor: Community pattern
Status: stable
Interop with: Redis, Anthropic prompt caching, OpenAI prompt caching, LangChain cache

Frequently asked questions

What should I cache?

Pure, idempotent tool calls with stable inputs: web fetches, database reads, documentation lookups, embeddings. Do NOT cache writes, payments, or anything with side effects.

What's prompt caching?

Frontier providers (Anthropic, OpenAI, Gemini) let you mark a prefix of your prompt as cacheable. Subsequent calls with the same prefix skip re-processing it, saving 90%+ on input tokens for long system prompts or document context.

How do I invalidate?

Combine TTL (time-based) with versioning (e.g., include a data-version hash in the cache key). For user-facing caches, expose an explicit 'refresh' action.

Sources

Anthropic — prompt caching — accessed 2026-04-20
OpenAI — prompt caching — accessed 2026-04-20

Protocol facts

Frequently asked questions

Sources

Related