Capability · Framework — rag
Chonkie
Chonkie is a focused alternative to the chunking utilities bundled inside LangChain and LlamaIndex. It's small (pip install is tiny by default), fast (uses `tokenizers` and Rust-backed splitters where it can), and pragmatic — with strategies for recursive, semantic, SDPM, and late chunking that you can mix and match per document type.
Framework facts
- Category
- rag
- Language
- Python
- License
- MIT
- Repository
- https://github.com/chonkie-inc/chonkie
Install
pip install chonkie
# semantic extras:
pip install 'chonkie[semantic]' Quickstart
from chonkie import RecursiveChunker
chunker = RecursiveChunker(chunk_size=512)
chunks = chunker.chunk(open('doc.txt').read())
for c in chunks[:3]:
print(c.token_count, c.text[:60]) Alternatives
- LangChain text splitters — bundled with LC
- LlamaIndex NodeParser — bundled with LI
- semchunk — another small semantic splitter
- llm-text-splitter — lightweight alternative
Frequently asked questions
Why a separate chunking library?
Because chunking quality drives RAG quality more than any other ETL step, and most framework-bundled splitters are optimized for generality, not throughput. Chonkie wins when you process millions of documents.
Does Chonkie do semantic chunking?
Yes — SemanticChunker and SDPMChunker are both included in the `[semantic]` extra. They use an embedding model to cut at topical shifts rather than fixed token counts.
Sources
- Chonkie — docs — accessed 2026-04-20
- Chonkie GitHub — accessed 2026-04-20