Curiosity · Concept

Chunking Strategies (for RAG)

Chunking is the unglamorous but decisive step in any RAG system. Too large, and chunks dilute the signal and waste context. Too small, and they lose the surrounding cues the LLM needs to interpret them. The main strategy families are fixed-size token windows with overlap; structural chunking by sentence, paragraph, or heading; hierarchical / parent-document chunking (retrieve small, return big); and semantic chunking that splits where embedding similarity drops. In production, different content types usually get different chunkers.

Quick reference

Proficiency: Intermediate
Also known as: document chunking, text splitting
Prerequisites: Retrieval-Augmented Generation, Embeddings

Frequently asked questions

What is a chunking strategy?

A chunking strategy is the rule for splitting source documents into units that you embed and index for retrieval. It decides chunk size, overlap, and where splits are allowed (character, token, sentence, paragraph, heading, semantic boundary).

What chunk size should I use?

Start around 300-800 tokens for most text, with 10-20% overlap. Short, FAQ-style content benefits from smaller chunks; dense technical prose often needs larger ones. The best size depends on the embedding model and query granularity — benchmark on real queries.

What is hierarchical / parent-document chunking?

Index small, focused chunks for high-precision retrieval, but return the larger parent section (e.g., the whole paragraph or subsection) to the LLM. This combines strong retrieval signal with enough surrounding context.

What common mistakes should I avoid?

Splitting mid-sentence, ignoring document structure (headings, lists, code blocks), using one chunker across heterogeneous content, and never measuring. Instrument a small eval set and vary chunk size first — it usually beats most other RAG tweaks.

Sources

LlamaIndex — Chunking strategies documentation — accessed 2026-04-20
Lewis et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — accessed 2026-04-20

Quick reference

Frequently asked questions

Sources

Related