Capability · Framework — rag
LlamaParse
LlamaParse is the parsing engine behind many LlamaIndex RAG pipelines. It's optimised for the failure modes that matter in retrieval: multi-column PDFs, nested tables, scanned pages, and slide decks. Output is LLM-ready Markdown, optionally chunked with preserved layout hints.
Framework facts
- Category
- rag
- Language
- Python / TypeScript
- License
- Commercial SaaS (client SDK MIT)
- Repository
- https://github.com/run-llama/llama_parse
Install
pip install llama-parse
export LLAMA_CLOUD_API_KEY=llx-... Quickstart
from llama_parse import LlamaParse
parser = LlamaParse(result_type='markdown', verbose=True)
docs = parser.load_data('./10k.pdf')
print(docs[0].text[:2000]) Alternatives
- Unstructured.io — open-source, self-hostable
- Docling — IBM's open document parser
- Azure Document Intelligence — enterprise-grade
- AWS Textract — strong OCR + form extraction
Frequently asked questions
Can I self-host LlamaParse?
No — LlamaParse is a hosted service. If you need a self-hostable alternative use Unstructured, Docling, or Marker depending on your document mix.
Does LlamaParse do OCR?
Yes, it auto-detects scanned pages and runs OCR, then merges the text with layout information from native digital pages.
Sources
- LlamaParse — docs — accessed 2026-04-20
- LlamaParse GitHub — accessed 2026-04-20