dllmforge.IE_agent_extractor_docling¶

Synchronous Information Extractor module for extracting structured information from documents using LLM with Docling.

Classes

`DoclingDocumentProcessor`(config)	Document processor using Docling for advanced PDF processing
`DoclingInfoExtractor`(config, output_schema)	Class for extracting information from documents using LLM with Docling preprocessing
`DoclingProcessedDocument`(content, content_type)	Class representing a document processed by Docling
`DocumentChunk`(content, content_type[, ...])	Class representing a chunk of document content
`DocumentConverter`(args, *kwargs)	Fallback stub used when the real docling package isn't available.

class dllmforge.IE_agent_extractor_docling.DocumentConverter(*args, **kwargs)[source]¶

Fallback stub used when the real docling package isn’t available.

The stub is intentionally minimal: it can be instantiated safely but its convert method raises a RuntimeError. Test suites can still patch DocumentConverter where they need to simulate conversions.

__init__(*args, **kwargs)[source]¶

convert(*args, **kwargs)[source]¶

class dllmforge.IE_agent_extractor_docling.DoclingProcessedDocument(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None, docling_result=None)[source]¶

Class representing a document processed by Docling

__init__(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None, docling_result=None)[source]¶

class dllmforge.IE_agent_extractor_docling.DocumentChunk(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None, docling_elements: List | None = None)[source]¶

Class representing a chunk of document content

__init__(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None, docling_elements: List | None = None)[source]¶

class dllmforge.IE_agent_extractor_docling.DoclingDocumentProcessor(config)[source]¶

Document processor using Docling for advanced PDF processing

__init__(config)[source]¶

encode_image_base64(image_data: bytes) → str[source]¶: Encode image data to base64 string

process_document(file_path: Path) → DoclingProcessedDocument | None[source]¶: Process a single document using Docling

process_directory() → List[DoclingProcessedDocument][source]¶: Process all documents in the configured directory

class dllmforge.IE_agent_extractor_docling.DoclingInfoExtractor(config: IEAgentConfig, output_schema: type[BaseModel], llm_api: LangchainAPI | None = None)[source]¶

Class for extracting information from documents using LLM with Docling preprocessing

Initialize the information extractor

__init__(config: IEAgentConfig, output_schema: type[BaseModel], llm_api: LangchainAPI | None = None)[source]¶: Initialize the information extractor

refine_system_prompt(task_description: str) → str[source]¶: Use LLM to refine user’s task description into a proper system prompt

chunk_document(doc: DoclingProcessedDocument) → Generator[DocumentChunk, None, None][source]¶: Split document into chunks based on Docling structure if needed

create_text_extraction_prompt() → ChatPromptTemplate[source]¶: /no_think Create prompt template for text-based information extraction with Docling awareness

process_text_chunk(chunk: DocumentChunk) → Dict[str, Any] | None[source]¶: Process a text document chunk with Docling enhancements

create_multimodal_extraction_prompt() → ChatPromptTemplate[source]¶: Create prompt template for multimodal extraction with Docling structure

process_multimodal_chunk(chunk: DocumentChunk, doc: DoclingProcessedDocument) → Dict[str, Any] | None[source]¶: Process chunk with access to original Docling result for multimodal content

process_chunk(chunk: DocumentChunk, doc: DoclingProcessedDocument) → Dict[str, Any] | None[source]¶: Process a document chunk with Docling context

process_document(doc: DoclingProcessedDocument | List[DoclingProcessedDocument]) → List[Dict[str, Any]][source]¶: Process document and extract information

save_results(results: List[Any], output_path: Path) → None[source]¶: Save extraction results to JSON file

process_all() → None[source]¶: Process all documents in configured directory

dllmforge.IE_agent_extractor_docling¶

dllmforge

Navigation

Related Topics