dllmforge.IE_agent_extractor_docling¶
Synchronous Information Extractor module for extracting structured information from documents using LLM with Docling.
Classes
|
Document processor using Docling for advanced PDF processing |
|
Class for extracting information from documents using LLM with Docling preprocessing |
|
Class representing a document processed by Docling |
|
Class representing a chunk of document content |
|
Fallback stub used when the real docling package isn't available. |
- class dllmforge.IE_agent_extractor_docling.DocumentConverter(*args, **kwargs)[source]¶
Fallback stub used when the real docling package isn’t available.
The stub is intentionally minimal: it can be instantiated safely but its convert method raises a RuntimeError. Test suites can still patch DocumentConverter where they need to simulate conversions.
- class dllmforge.IE_agent_extractor_docling.DoclingProcessedDocument(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None, docling_result=None)[source]¶
Class representing a document processed by Docling
- class dllmforge.IE_agent_extractor_docling.DocumentChunk(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None, docling_elements: List | None = None)[source]¶
Class representing a chunk of document content
- class dllmforge.IE_agent_extractor_docling.DoclingDocumentProcessor(config)[source]¶
Document processor using Docling for advanced PDF processing
- process_document(file_path: Path) DoclingProcessedDocument | None[source]¶
Process a single document using Docling
- process_directory() List[DoclingProcessedDocument][source]¶
Process all documents in the configured directory
- class dllmforge.IE_agent_extractor_docling.DoclingInfoExtractor(config: IEAgentConfig, output_schema: type[BaseModel], llm_api: LangchainAPI | None = None)[source]¶
Class for extracting information from documents using LLM with Docling preprocessing
Initialize the information extractor
- __init__(config: IEAgentConfig, output_schema: type[BaseModel], llm_api: LangchainAPI | None = None)[source]¶
Initialize the information extractor
- refine_system_prompt(task_description: str) str[source]¶
Use LLM to refine user’s task description into a proper system prompt
- chunk_document(doc: DoclingProcessedDocument) Generator[DocumentChunk, None, None][source]¶
Split document into chunks based on Docling structure if needed
- create_text_extraction_prompt() ChatPromptTemplate[source]¶
/no_think Create prompt template for text-based information extraction with Docling awareness
- process_text_chunk(chunk: DocumentChunk) Dict[str, Any] | None[source]¶
Process a text document chunk with Docling enhancements
- create_multimodal_extraction_prompt() ChatPromptTemplate[source]¶
Create prompt template for multimodal extraction with Docling structure
- process_multimodal_chunk(chunk: DocumentChunk, doc: DoclingProcessedDocument) Dict[str, Any] | None[source]¶
Process chunk with access to original Docling result for multimodal content
- process_chunk(chunk: DocumentChunk, doc: DoclingProcessedDocument) Dict[str, Any] | None[source]¶
Process a document chunk with Docling context
- process_document(doc: DoclingProcessedDocument | List[DoclingProcessedDocument]) List[Dict[str, Any]][source]¶
Process document and extract information