Creativity · Agent Protocol

Agent Streaming / Partial Results Pattern

A 30-second agent call feels broken if the UI just spins. The streaming pattern pushes partial results as they become available: LLM tokens as they generate, tool-call announcements ('searching the web…'), and interim reasoning. The perceived-latency win is enormous — users tolerate 30 seconds of visible progress far better than 5 seconds of silence.

Protocol facts

Sponsor
Community pattern
Status
stable
Interop with
Server-Sent Events, WebSockets, Vercel AI SDK, LangGraph streaming

Frequently asked questions

What should I stream?

At minimum, LLM tokens. Better: also stream 'tool X invoked', 'tool X returned (n bytes)', and major plan transitions. This gives users insight into what the agent is doing, not just what it will eventually say.

SSE vs. WebSocket?

SSE is simpler (HTTP, unidirectional, auto-reconnect) and sufficient for most agent streaming. WebSocket adds bidirectionality for interactive use (cancel, interrupt, send follow-ups) at the cost of operational complexity.

Can I cancel a streaming agent mid-run?

Yes — that's another reason to stream. Bidirectional channels (WebSocket) or an AbortController signal from the client lets the server stop the agent and release the LLM connection, saving cost on abandoned runs.

Sources

  1. Vercel AI SDK streaming — accessed 2026-04-20
  2. LangGraph streaming — accessed 2026-04-20