inline::docling
Description
Docling is a layout-aware, structure-preserving document parser for Llama Stack. Unlike simple text extraction, Docling understands document structure — headings, tables, lists, and sections — and produces Markdown-formatted output that preserves semantic boundaries. It supports PDF, DOCX, PPTX, HTML, and images.
Features
- Structure-aware chunking — splits at semantic boundaries (headings, sections) using Docling's HybridChunker
- Layout preservation — tables, lists, and nested structures are converted to Markdown
- Multi-format support — PDF, DOCX, PPTX, HTML, and images
- Better RAG quality — structured chunks with heading metadata produce more relevant retrieval results
Usage
Start Llama Stack with the Docling file processor using the --providers flag:
OLLAMA_URL=http://localhost:11434/v1 llama stack run \
--providers "file_processors=inline::docling,files=inline::localfs,vector_io=inline::faiss,inference=inline::sentence-transformers,inference=remote::ollama" \
--port 8321
Or add it to a custom run.yaml:
file_processors:
- provider_id: docling
provider_type: inline::docling
config: {}
Installation
pip install docling
Documentation
See Docling's documentation for more details.
Configuration
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
default_chunk_size_tokens | int | No | 800 | Default chunk size in tokens when chunking_strategy type is 'auto' |
default_chunk_overlap_tokens | int | No | 400 | Default chunk overlap in tokens when chunking_strategy type is 'auto' |
Sample Configuration
default_chunk_size_tokens: 800
default_chunk_overlap_tokens: 400