Capability · Framework — rag

Crawl4AI

Crawl4AI packages Playwright-based rendering, extraction strategies (CSS, LLM-driven, cosine filtering), and semantic chunking behind a clean async API. It's designed as a free local alternative to hosted crawlers like Firecrawl for teams that want full control and zero API spend.

Framework facts

Category: rag
Language: Python
License: Apache-2.0
Repository: https://github.com/unclecode/crawl4ai

Install

pip install -U crawl4ai
crawl4ai-setup  # installs Playwright browsers

Quickstart

import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url='https://engineering.vips.edu')
        print(result.markdown[:500])

asyncio.run(main())

Alternatives

Firecrawl — hosted equivalent
ScrapingAnt — commercial API
Playwright + Trafilatura — roll your own
Apify — enterprise crawler platform

Frequently asked questions

Crawl4AI or Firecrawl?

Crawl4AI if you want fully local and free with more control knobs. Firecrawl if you want zero-ops and a hosted API.

Does it support proxies and stealth?

Yes. It supports proxy rotation, custom user agents, and undetected Chromium profiles for bot-protected sites.

Sources

Crawl4AI — GitHub — accessed 2026-04-20
Crawl4AI — docs — accessed 2026-04-20