Capability · Framework — rag
Firecrawl
Firecrawl handles the grubby parts of web ingestion: JavaScript rendering, sitemap following, rate limiting, and content extraction. It emits LLM-ready Markdown (or JSON with a schema), making it a drop-in source for RAG corpora, agent browsing tools, and continuous knowledge sync.
Framework facts
- Category
- rag
- Language
- TypeScript / Python
- License
- AGPL-3.0
- Repository
- https://github.com/mendableai/firecrawl
Install
pip install firecrawl-py
# or
npm install @mendable/firecrawl-js Quickstart
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key='fc-...')
job = app.crawl_url('https://engineering.vips.edu', params={'limit': 50})
for page in job['data']:
print(page['metadata']['url'], len(page['markdown'])) Alternatives
- Jina Reader — single-URL markdownification
- Trafilatura — local HTML extraction
- Crawl4AI — async crawler for RAG
- ScrapingBee — commercial rendering API
Frequently asked questions
Cloud or self-host?
The cloud API is the fastest way to start and scales headless browsers for you. Self-host when you need private network access, data residency, or unlimited crawls.
Does Firecrawl respect robots.txt?
Yes by default. You can override on self-hosted deployments, but you're responsible for legal and ethical compliance.
Sources
- Firecrawl — GitHub — accessed 2026-04-20
- Firecrawl — docs — accessed 2026-04-20