inline::docling

Description

Docling is a layout-aware, structure-preserving document parser for Llama Stack. Unlike simple text extraction, Docling understands document structure — headings, tables, lists, and sections — and produces Markdown-formatted output that preserves semantic boundaries. It supports PDF, DOCX, PPTX, HTML, and images.

Features

Structure-aware chunking — splits at semantic boundaries (headings, sections) using Docling's HybridChunker
Layout preservation — tables, lists, and nested structures are converted to Markdown
Multi-format support — PDF, DOCX, PPTX, HTML, and images
Better RAG quality — structured chunks with heading metadata produce more relevant retrieval results

Usage

Start Llama Stack with the Docling file processor using the --providers flag:

OLLAMA_URL=http://localhost:11434/v1 llama stack run \
  --providers "file_processors=inline::docling,files=inline::localfs,vector_io=inline::faiss,inference=inline::sentence-transformers,inference=remote::ollama" \
  --port 8321

Or add it to a custom run.yaml:

file_processors:
  - provider_id: docling
    provider_type: inline::docling
    config: {}

Installation

pip install docling

Documentation

See Docling's documentation for more details.

Configuration

Field	Type	Required	Default	Description
`default_chunk_size_tokens`	`int`	No	800	Default chunk size in tokens when chunking_strategy type is 'auto'
`default_chunk_overlap_tokens`	`int`	No	400	Default chunk overlap in tokens when chunking_strategy type is 'auto'

Sample Configuration

default_chunk_size_tokens: 800
default_chunk_overlap_tokens: 400

Description​

Features​

Usage​

Installation​

Documentation​

Configuration​

Sample Configuration​