Capability · Framework — rag

Crawl4AI

Crawl4AI packages Playwright-based rendering, extraction strategies (CSS, LLM-driven, cosine filtering), and semantic chunking behind a clean async API. It's designed as a free local alternative to hosted crawlers like Firecrawl for teams that want full control and zero API spend.

Framework facts

Category
rag
Language
Python
License
Apache-2.0
Repository
https://github.com/unclecode/crawl4ai

Install

pip install -U crawl4ai
crawl4ai-setup  # installs Playwright browsers

Quickstart

import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url='https://engineering.vips.edu')
        print(result.markdown[:500])

asyncio.run(main())

Alternatives

  • Firecrawl — hosted equivalent
  • ScrapingAnt — commercial API
  • Playwright + Trafilatura — roll your own
  • Apify — enterprise crawler platform

Frequently asked questions

Crawl4AI or Firecrawl?

Crawl4AI if you want fully local and free with more control knobs. Firecrawl if you want zero-ops and a hosted API.

Does it support proxies and stealth?

Yes. It supports proxy rotation, custom user agents, and undetected Chromium profiles for bot-protected sites.

Sources

  1. Crawl4AI — GitHub — accessed 2026-04-20
  2. Crawl4AI — docs — accessed 2026-04-20