What this means
When a user asks a question, the RAG system searches for chunks whose embeddings are most similar to the query embedding. The system can only retrieve what exists as a chunk. If the relevant information is split across two chunks by an arbitrary boundary, neither chunk may match the query well enough to be retrieved. If chunks are too small, they lack the context needed for the embedding to accurately represent their meaning. If chunks are too large, they may match the query loosely but return large volumes of irrelevant surrounding content.
The chunking decision is upstream of every other retrieval quality concern. It cannot be compensated for by a better embedding model or a more sophisticated re-ranker. It must be designed deliberately for each document type and query pattern.
Why it matters for business
Enterprise organisations accumulate documents with very different structures: dense legal contracts, tabular financial reports, narrative policy manuals, structured FAQ pages, technical specifications with mixed prose and tables, and scanned legacy documents. A single chunking strategy applied uniformly to all of these will perform acceptably on some and poorly on others.
When retrieval quality is poor, the consequences are direct and visible: the AI assistant gives incorrect or incomplete answers, users lose confidence in the system, and the promised productivity benefits fail to materialise. In regulated sectors — compliance documentation, HR policy, medical protocols — a system that retrieves and presents outdated or partially relevant content may create genuine risk, not merely inconvenience.
The organisational cost of poor chunking is frequently misattributed. When a RAG system produces inaccurate answers, teams often suspect the language model's capability rather than the retrieval pipeline. Diagnosing retrieval quality separately from generation quality is essential for identifying the true source of failure.
How it works technically
Chunking strategies range from simple to sophisticated:
Fixed-size chunking: The document is split every N tokens or characters, with optional overlap. This is the simplest approach and the default in many frameworks. It is fast and consistent but ignores document structure — it may cut mid-sentence, mid-table or mid-list with no regard for semantic coherence.
Sentence-based chunking: Text is split at sentence boundaries. Chunks may then be grouped into fixed-size windows of adjacent sentences to ensure minimum meaningful size. Better than character splitting for prose documents; less effective for structured content.
Semantic chunking: An embedding model is used to detect where meaning shifts in the document — boundaries are placed where the semantic similarity between consecutive sentences drops below a threshold. More computationally expensive but better preserves thematic coherence within chunks.
Structural chunking: Document structure is respected explicitly — chunks align with headings, sections, paragraphs, list items, table rows or other structural elements. For well-structured documents (policies, manuals, specifications), this typically produces the best retrieval results because the chunks match how the document's authors organised information.
Hierarchical chunking: Both a small chunk (for precise matching) and its containing larger parent chunk (for complete context) are stored. Retrieval uses the small chunk for similarity matching; the parent chunk is passed to the language model for context. This addresses the tension between retrieval precision and generation context.
Overlap between adjacent chunks — typically 10–20% of chunk size — preserves context at boundaries. A sentence that ends one chunk and a sentence that begins the next may together form a coherent thought; overlap ensures neither boundary creates an orphaned fragment.
Practical implementation considerations
Different document types benefit from different chunking approaches. A useful framework:
| Document type | Recommended approach |
|---|
| Policy/procedure manuals | Structural chunking by section/heading |
| Contracts and legal documents | Structural chunking by clause; overlap at clause boundaries |
| Financial reports | Structural chunking preserving table integrity; separate table/prose handling |
| FAQ pages | One Q&A pair per chunk |
| Technical specifications | Structural + hierarchical; preserve context for code or formula references |
| Scanned or poorly structured PDFs | Fixed-size with generous overlap; invest in pre-processing first |
The practical implementation process should include:
- Sample and categorise your document corpus before choosing a strategy. Different document types may require different pipelines.
- Set a baseline using fixed-size chunking, then measure retrieval quality on a test set of representative queries. This gives you something to improve against.
- Evaluate iteratively: change one chunking parameter at a time and measure the effect on retrieval metrics (precision at k, recall at k) before committing to a configuration.
- Invest in pre-processing for complex formats: PDFs with multi-column layouts, embedded tables and footnotes require layout-aware parsing (tools such as Unstructured, Azure Document Intelligence or AWS Textract) before chunking can be applied meaningfully.
When designing document processing pipelines for RAG implementations, Edison AI's AI implementation team treats chunking strategy as a first-class design decision — not a framework default — because it is the layer most directly responsible for production retrieval quality.
Common mistakes
- Using framework defaults without validation. LangChain's RecursiveCharacterTextSplitter defaults and similar convenience defaults are starting points, not optimal configurations. Always validate against your actual documents and queries.
- Applying the same chunking strategy to all document types. A strategy optimised for policy prose will perform poorly on financial tables or structured data sheets. Build document-type-aware pipelines.
- Setting chunk size based on model context window rather than retrieval requirements. The context window constrains how many chunks can fit in a single prompt; it does not tell you the optimal chunk size for retrieval. These are separate decisions.
- Neglecting pre-processing quality. Chunking a corrupted or poorly extracted document produces corrupted chunks. Garbage in, garbage out — pre-processing is not optional.
- Not measuring retrieval quality independently. If you can only measure end-to-end answer quality, you cannot determine whether errors come from retrieval or generation. Build evaluation pipelines that measure retrieval precision and recall independently.
What leaders should do next
Before finalising the technical design of any RAG system, conduct a structured audit of the documents the system will process: How many distinct document types exist? What are their formats and structural characteristics? Are any scanned or poorly formatted? Use the answers to define a chunking strategy for each document category, not a single universal approach. Allocate evaluation effort — and time — to testing retrieval quality on representative queries before the system is presented to end users.
Edison AI builds bespoke AI systems — including retrieval over your own documents — for Australian businesses.