Capability · Framework — rag

LlamaParse

LlamaParse is the parsing engine behind many LlamaIndex RAG pipelines. It's optimised for the failure modes that matter in retrieval: multi-column PDFs, nested tables, scanned pages, and slide decks. Output is LLM-ready Markdown, optionally chunked with preserved layout hints.

Framework facts

Category
rag
Language
Python / TypeScript
License
Commercial SaaS (client SDK MIT)
Repository
https://github.com/run-llama/llama_parse

Install

pip install llama-parse
export LLAMA_CLOUD_API_KEY=llx-...

Quickstart

from llama_parse import LlamaParse

parser = LlamaParse(result_type='markdown', verbose=True)
docs = parser.load_data('./10k.pdf')
print(docs[0].text[:2000])

Alternatives

  • Unstructured.io — open-source, self-hostable
  • Docling — IBM's open document parser
  • Azure Document Intelligence — enterprise-grade
  • AWS Textract — strong OCR + form extraction

Frequently asked questions

Can I self-host LlamaParse?

No — LlamaParse is a hosted service. If you need a self-hostable alternative use Unstructured, Docling, or Marker depending on your document mix.

Does LlamaParse do OCR?

Yes, it auto-detects scanned pages and runs OCR, then merges the text with layout information from native digital pages.

Sources

  1. LlamaParse — docs — accessed 2026-04-20
  2. LlamaParse GitHub — accessed 2026-04-20