Capability · Framework — rag

Chonkie

Chonkie is a focused alternative to the chunking utilities bundled inside LangChain and LlamaIndex. It's small (pip install is tiny by default), fast (uses `tokenizers` and Rust-backed splitters where it can), and pragmatic — with strategies for recursive, semantic, SDPM, and late chunking that you can mix and match per document type.

Framework facts

Category
rag
Language
Python
License
MIT
Repository
https://github.com/chonkie-inc/chonkie

Install

pip install chonkie
# semantic extras:
pip install 'chonkie[semantic]'

Quickstart

from chonkie import RecursiveChunker

chunker = RecursiveChunker(chunk_size=512)
chunks = chunker.chunk(open('doc.txt').read())
for c in chunks[:3]:
    print(c.token_count, c.text[:60])

Alternatives

  • LangChain text splitters — bundled with LC
  • LlamaIndex NodeParser — bundled with LI
  • semchunk — another small semantic splitter
  • llm-text-splitter — lightweight alternative

Frequently asked questions

Why a separate chunking library?

Because chunking quality drives RAG quality more than any other ETL step, and most framework-bundled splitters are optimized for generality, not throughput. Chonkie wins when you process millions of documents.

Does Chonkie do semantic chunking?

Yes — SemanticChunker and SDPMChunker are both included in the `[semantic]` extra. They use an embedding model to cut at topical shifts rather than fixed token counts.

Sources

  1. Chonkie — docs — accessed 2026-04-20
  2. Chonkie GitHub — accessed 2026-04-20