Creativity · Agent Protocol
Agent Cache-and-Memoize Pattern
An agent that answers 'what's the weather in Delhi' by calling the weather API every single time is burning money. The cache-and-memoize pattern keys tool results by their arguments and returns a cached value within a TTL, while LLM prompt caching (Anthropic, OpenAI) lets identical system prompts skip recomputation. In production, well-tuned caches drop cost 50–90% on high-volume agents.
Protocol facts
- Sponsor
- Community pattern
- Status
- stable
- Interop with
- Redis, Anthropic prompt caching, OpenAI prompt caching, LangChain cache
Frequently asked questions
What should I cache?
Pure, idempotent tool calls with stable inputs: web fetches, database reads, documentation lookups, embeddings. Do NOT cache writes, payments, or anything with side effects.
What's prompt caching?
Frontier providers (Anthropic, OpenAI, Gemini) let you mark a prefix of your prompt as cacheable. Subsequent calls with the same prefix skip re-processing it, saving 90%+ on input tokens for long system prompts or document context.
How do I invalidate?
Combine TTL (time-based) with versioning (e.g., include a data-version hash in the cache key). For user-facing caches, expose an explicit 'refresh' action.
Sources
- Anthropic — prompt caching — accessed 2026-04-20
- OpenAI — prompt caching — accessed 2026-04-20