Skip to main content

inline::docling

Description

Docling is a layout-aware, structure-preserving document parser for Llama Stack. Unlike simple text extraction, Docling understands document structure — headings, tables, lists, and sections — and produces Markdown-formatted output that preserves semantic boundaries. It supports PDF, DOCX, PPTX, HTML, and images.

Features

  • Structure-aware chunking — splits at semantic boundaries (headings, sections) using Docling's HybridChunker
  • Layout preservation — tables, lists, and nested structures are converted to Markdown
  • Multi-format support — PDF, DOCX, PPTX, HTML, and images
  • Better RAG quality — structured chunks with heading metadata produce more relevant retrieval results

Usage

Start Llama Stack with the Docling file processor using the --providers flag:

OLLAMA_URL=http://localhost:11434/v1 llama stack run \
--providers "file_processors=inline::docling,files=inline::localfs,vector_io=inline::faiss,inference=inline::sentence-transformers,inference=remote::ollama" \
--port 8321

Or add it to a custom run.yaml:

file_processors:
- provider_id: docling
provider_type: inline::docling
config: {}

Installation

pip install docling

Documentation

See Docling's documentation for more details.

Configuration

FieldTypeRequiredDefaultDescription
default_chunk_size_tokensintNo800Default chunk size in tokens when chunking_strategy type is 'auto'
default_chunk_overlap_tokensintNo400Default chunk overlap in tokens when chunking_strategy type is 'auto'

Sample Configuration

default_chunk_size_tokens: 800
default_chunk_overlap_tokens: 400