Capability · Framework — rag
Crawl4AI
Crawl4AI packages Playwright-based rendering, extraction strategies (CSS, LLM-driven, cosine filtering), and semantic chunking behind a clean async API. It's designed as a free local alternative to hosted crawlers like Firecrawl for teams that want full control and zero API spend.
Framework facts
- Category
- rag
- Language
- Python
- License
- Apache-2.0
- Repository
- https://github.com/unclecode/crawl4ai
Install
pip install -U crawl4ai
crawl4ai-setup # installs Playwright browsers Quickstart
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url='https://engineering.vips.edu')
print(result.markdown[:500])
asyncio.run(main()) Alternatives
- Firecrawl — hosted equivalent
- ScrapingAnt — commercial API
- Playwright + Trafilatura — roll your own
- Apify — enterprise crawler platform
Frequently asked questions
Crawl4AI or Firecrawl?
Crawl4AI if you want fully local and free with more control knobs. Firecrawl if you want zero-ops and a hosted API.
Does it support proxies and stealth?
Yes. It supports proxy rotation, custom user agents, and undetected Chromium profiles for bot-protected sites.
Sources
- Crawl4AI — GitHub — accessed 2026-04-20
- Crawl4AI — docs — accessed 2026-04-20