Creativity · Agent Protocol

Browser Use: Open-Source LLM Browser Agent Framework

Browser Use, released in 2024 and one of the fastest-growing agent frameworks on GitHub, wraps Playwright with an LLM-friendly action space: the agent receives an annotated screenshot plus a compressed DOM, and emits typed actions (click index 12, type into input 7, scroll). It supports any LLM via LangChain and ships with retries, task history, and vision. The project became the de-facto OSS stack for browser-driving agents.

Protocol facts

Sponsor: Browser Use (open source)
Status: stable
Spec: https://github.com/browser-use/browser-use
Interop with: Playwright, LangChain, Anthropic, OpenAI, WebArena

Frequently asked questions

Why wrap Playwright instead of using raw Playwright from the LLM?

Raw Playwright requires writing scripts; LLMs struggle with full HTML. Browser Use gives the model a compact, numbered element list plus screenshot, so it only emits semantic actions — much higher reliability.

Does Browser Use support vision models?

Yes — screenshots are first-class. It works with Claude (vision), GPT-4o/5, and Gemini, and ships hybrid modes that mix visual and DOM reasoning.

Is it production-ready?

It's stable and widely adopted for research, evals, and workflow automation. For scaled RPA-style use, teams still layer their own supervision, retry, and sandboxing on top.

Sources

Browser Use GitHub — accessed 2026-04-20
Browser Use docs — accessed 2026-04-20

Protocol facts

Frequently asked questions

Sources

Related