inline::pypdf

Description

PyPDF-based file processor for extracting text content from documents.

Field	Type	Required	Default	Description
`default_chunk_size_tokens`	`int`	No	800	Default chunk size in tokens when chunking_strategy type is 'auto'
`default_chunk_overlap_tokens`	`int`	No	400	Default chunk overlap in tokens when chunking_strategy type is 'auto'
`extract_metadata`	`bool`	No	True	Whether to extract PDF metadata (title, author, etc.)
`clean_text`	`bool`	No	True	Whether to clean extracted text (remove extra whitespace, normalize line breaks)

{}