Skip to main content

inline::pypdf

Description

PyPDF-based file processor for extracting text content from documents.

Configuration

FieldTypeRequiredDefaultDescription
default_chunk_size_tokensintNo800Default chunk size in tokens when chunking_strategy type is 'auto'
default_chunk_overlap_tokensintNo400Default chunk overlap in tokens when chunking_strategy type is 'auto'
extract_metadataboolNoTrueWhether to extract PDF metadata (title, author, etc.)
clean_textboolNoTrueWhether to clean extracted text (remove extra whitespace, normalize line breaks)

Sample Configuration

{}