Capability · Framework — agents

browser-use

browser-use turns any LLM into a web-navigating agent by exposing Playwright-driven browser actions as tools. It flattens the DOM into a numbered element list the model can reference, handles login flows, iframes, and shadow DOM, and is widely used to bootstrap web automation research and shipping agents.

Framework facts

Category
agents
Language
Python
License
MIT
Repository
https://github.com/browser-use/browser-use

Install

pip install browser-use
playwright install chromium --with-deps

Quickstart

import asyncio
from browser_use import Agent
from langchain_anthropic import ChatAnthropic

async def main():
    agent = Agent(
        task='Find the top Hacker News story and summarise it.',
        llm=ChatAnthropic(model='claude-opus-4-7'),
    )
    await agent.run()

asyncio.run(main())

Alternatives

  • Skyvern — self-hosted browser agents with vision-first flows
  • LaVague — action engine with a world model
  • Stagehand — TypeScript browser agent by Browserbase
  • AgentQL — query-language layer over Playwright

Frequently asked questions

Does browser-use need vision models?

No. It works with text-only LLMs by flattening the DOM into a numbered list of interactive elements. Vision mode adds screenshots and improves grounding on complex sites but costs more.

Is it production-safe?

For many workflows yes, but login-gated or bot-protected sites may still require human handoff. Always add retries, human-in-the-loop checkpoints, and rate limits.

Sources

  1. browser-use — GitHub — accessed 2026-04-20
  2. browser-use — docs — accessed 2026-04-20