Creativity · Agent Protocol
Browser Use: Open-Source LLM Browser Agent Framework
Browser Use, released in 2024 and one of the fastest-growing agent frameworks on GitHub, wraps Playwright with an LLM-friendly action space: the agent receives an annotated screenshot plus a compressed DOM, and emits typed actions (click index 12, type into input 7, scroll). It supports any LLM via LangChain and ships with retries, task history, and vision. The project became the de-facto OSS stack for browser-driving agents.
Protocol facts
- Sponsor
- Browser Use (open source)
- Status
- stable
- Spec
- https://github.com/browser-use/browser-use
- Interop with
- Playwright, LangChain, Anthropic, OpenAI, WebArena
Frequently asked questions
Why wrap Playwright instead of using raw Playwright from the LLM?
Raw Playwright requires writing scripts; LLMs struggle with full HTML. Browser Use gives the model a compact, numbered element list plus screenshot, so it only emits semantic actions — much higher reliability.
Does Browser Use support vision models?
Yes — screenshots are first-class. It works with Claude (vision), GPT-4o/5, and Gemini, and ships hybrid modes that mix visual and DOM reasoning.
Is it production-ready?
It's stable and widely adopted for research, evals, and workflow automation. For scaled RPA-style use, teams still layer their own supervision, retry, and sandboxing on top.
Sources
- Browser Use GitHub — accessed 2026-04-20
- Browser Use docs — accessed 2026-04-20