DLLMForge Documentation¶

Welcome to DLLMForge¶

DLLMForge is a repository for LLM (Large Language Model) tools developed at Deltares. It provides simple open and closed source tools to interact with various with D-LLMForge you can :

Use an simple LLM to ask questions.
Build your RAG with HuggingFace or AZURE embeddings and vector stores.
Create agents that can use tools to answer complex questions.
Extract structured information from documents using LLMs.

Features¶

DLLMForge provides a modular toolkit for:

Multi-LLM Support: Integration with OpenAI, Anthropic, and open-source Deltares hosted models
RAG Pipeline: Complete document ingestion, embedding, and retrieval system
Agent Framework: Simple but extensible agent architecture with tool support
Evaluation Tools: Comprehensive RAG system evaluation using various metrics
Flexible Backends: Support for both cloud (Azure, OpenAI) and local deployments

Repository Structure¶

DLLMForge is organized into several key components that work together to provide a comprehensive LLM toolkit:

Core Package (`dllmforge/`)¶

The main package contains the following modules:

Core Agent Framework

agent_core.py - Simple agent infrastructure with tool support
- SimpleAgent - Basic agentic workflows
- create_basic_agent() - Agent factory function
- create_basic_tools() - Tool creation utilities

Information Extraction Framework

IE_agent_config.py - Configuration management for IE agents
- IEAgentConfig - Main configuration class
- SchemaConfig - Schema generation configuration
- DocumentConfig - Document processing configuration
- ExtractorConfig - Information extraction configuration
IE_agent_schema_generator.py - Automatic schema generation for structured extraction
- SchemaGenerator - Generate Pydantic schemas from task descriptions
IE_agent_document_processor.py - Document processing for information extraction
- DocumentProcessor - Convert documents to LLM-readable format
- ProcessedDocument - Processed document container
IE_agent_extractor.py - Main information extraction orchestrator
- InfoExtractor - Extract structured information from documents
- DocumentChunk - Document chunk container
IE_agent_extractor_docling.py - Enhanced extraction with Docling preprocessing
- DoclingInfoExtractor - Advanced document structure-aware extraction

LLM API Integrations

openai_api.py - OpenAI API integration
- OpenAIAPI - OpenAI API wrapper
anthropic_api.py - Anthropic Claude API integration
- AnthropicAPI - Anthropic API wrapper
langchain_api.py - LangChain framework integration
llamaindex_api.py - LlamaIndex framework integration
- LlamaIndexAPI - LlamaIndex API wrapper

RAG (Retrieval-Augmented Generation) Components

rag_preprocess_documents.py - Document loading and chunking
- DocumentLoader - Abstract document loader
- PDFLoader - Load PDF documents
- TextChunker - Split text into manageable chunks with overlap
rag_embedding.py - Azure OpenAI embedding models
- AzureOpenAIEmbeddingModel - Generate embeddings for text
rag_embedding_open_source.py - Open-source embedding models
- LangchainHFEmbeddingModel - HuggingFace embeddings via LangChain
rag_search_and_response.py - Search and response generation
- IndexManager - Manage vector indices
- Retriever - Retrieve relevant documents
- LLMResponder - Generate responses using LLMs
rag_evaluation.py - RAG system evaluation
- RAGEvaluator - Evaluate RAG system performance
- EvaluationResult - Store individual evaluation metrics
- RAGEvaluationResult - Store comprehensive RAG evaluation results

Specialized Components

LLMs/Deltares_LLMs.py - Deltares-specific LLM implementations
utils/ - Utility functions and helpers

Workflows (`workflows/`)¶

open_source_RAG.py - Example workflow for open-source RAG implementation

Example Streamlit-based Application (`streamlit_apps/`)¶

app.py - Streamlit-based RAG application

streamlit_water_management_app.py - Streamlit-based water management application

Quick Start¶

Installation¶

To install DLLMForge, you can use pip:

pip install git+https://github.com/Deltares-research/DLLMForge

Tutorials¶

The following tutorials are available:

Background Information¶

For more information on LLMs and RAG systems, see:

API Reference¶

Tutorials:

Background:

API Reference¶

dllmforge

DLLMForge - Deltares LLM Forge Toolkit

Modules¶

Simple agent core for DLLMForge - Clean LangGraph utilities.

This module provides simple, elegant utilities for creating LangGraph agents following the pattern established in water_management_agent_simple.py.

dllmforge.agent_core.tool(func)[source]¶

DLLMForge wrapper around LangChain’s @tool decorator.

This decorator provides a consistent interface for creating tools within the DLLMForge ecosystem while maintaining compatibility with LangChain’s tool system.

Parameters:: func – Function to be converted into a tool
Returns:: Tool function that can be used with SimpleAgent

class dllmforge.agent_core.SimpleAgent(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai', llm=None, enable_text_tool_routing: bool = False, max_tool_iterations: int = 3)[source]¶

Bases: object

Simple agent class for LangGraph workflows.

Initialize a simple LangGraph agent.

Parameters:

system_message – System message for the agent
temperature – LLM temperature setting
model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)

__init__(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai', llm=None, enable_text_tool_routing: bool = False, max_tool_iterations: int = 3)[source]¶

Initialize a simple LangGraph agent.

Parameters:

system_message – System message for the agent
temperature – LLM temperature setting
model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)

add_tool(tool_func: Callable) → None[source]¶

Add a tool to the agent.

Parameters:: tool_func – Function decorated with @tool

add_node(name: str, func: Callable) → None[source]¶

Add a node to the workflow.

Parameters:

name – Node name
func – Node function

add_edge(from_node: str, to_node: str) → None[source]¶

Add a simple edge between nodes.

Parameters:

from_node – Source node
to_node – Target node

add_conditional_edge(from_node: str, condition_func: Callable) → None[source]¶

Add a conditional edge.

Parameters:

from_node – Source node
condition_func – Function that determines routing

create_simple_workflow() → None[source]¶: Create a simple agent -> tools workflow with optional text-based tool routing.

compile(checkpointer=None) → None[source]¶: Compile the workflow.

process_query(query: str, stream: bool = True) → None[source]¶

Process a query with the agent.

Parameters:

query – User query
stream – Whether to stream the response

run_interactive() → None[source]¶: Run the agent in interactive mode.

dllmforge.agent_core.create_basic_agent(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai') → SimpleAgent[source]¶

Create a basic agent with standard setup.

Parameters:

system_message – System message for the agent
temperature – LLM temperature
model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)

Returns:

Configured agent instance

Return type:

SimpleAgent

dllmforge.agent_core.create_echo_tool()[source]¶: Create a simple echo tool for testing.

dllmforge.agent_core.create_basic_tools() → List[Callable][source]¶

Create basic utility tools for testing.

Returns:: List of tool functions

Schema Generator module for automatically generating Pydantic models based on user descriptions and example documents using LLM.

class dllmforge.IE_agent_schema_generator.PythonCodeOutputParser(*args: Any, name: str | None = None)[source]¶

Bases: BaseOutputParser[str]

Parse Python code from LLM responses that may contain markdown.

parse(text: str) → str[source]¶: Parse the output of an LLM call to extract Python code.

get_format_instructions() → str[source]¶: Instructions on how the LLM output should be formatted.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'protected_namespaces': ()}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str | None¶: The name of the Runnable. Used for debugging and tracing.

Bases: object

Class for generating Pydantic schemas using LLM

This class supports two usage modes:

CONFIG MODE: Pass a SchemaConfig object ```python config = SchemaConfig(

task_description=”Extract person info”, output_path=”schema.py”

) generator = SchemaGenerator(config=config) ```
DIRECT MODE: Pass arguments directly (no config object) ```python generator = SchemaGenerator(

task_description=”Extract person info”, output_path=”schema.py”

Both modes support all parameters: - task_description (REQUIRED in direct mode) - example_doc (optional: text or file path) - user_schema_path (optional: load existing schema) - output_path (optional: where to save generated schema) - llm_api (optional: custom LLM configuration)

Initialize the schema generator.

You can use either config (SchemaConfig), or pass the individual parameters directly.

Parameters:

config – Schema generation configuration (if provided, individual params are ignored)
llm_api – Optional pre-configured LangchainAPI instance
task_description – Description of the information extraction task (direct mode)
example_doc – Example document to help with schema generation (direct mode)
user_schema_path – Path to user-provided schema Python file (direct mode)
output_path – Path to save generated schema (direct mode)

Initialize the schema generator.

You can use either config (SchemaConfig), or pass the individual parameters directly.

Parameters:

config – Schema generation configuration (if provided, individual params are ignored)
llm_api – Optional pre-configured LangchainAPI instance
task_description – Description of the information extraction task (direct mode)
example_doc – Example document to help with schema generation (direct mode)
user_schema_path – Path to user-provided schema Python file (direct mode)
output_path – Path to save generated schema (direct mode)

setup_parser()[source]¶: Setup the Pydantic output parser for structured verification results

create_schema_generation_prompt() → ChatPromptTemplate[source]¶: Create prompt template for generating Pydantic schema

generate_schema() → str[source]¶: Generate Pydantic schema based on task description and optional example document

save_schema(schema_code: str) → None[source]¶: Save generated schema to a Python file

Document Processor module for preprocessing documents into text or images for LLM processing.

class dllmforge.IE_agent_document_processor.ProcessedDocument(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]¶

Bases: object

Class representing processed document content

Initialize processed document

Parameters:

content – The document content (text string or image bytes)
content_type – Type of content (‘text’ or ‘image’)
metadata – Additional metadata about the document

__init__(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]¶

Initialize processed document

Parameters:

content – The document content (text string or image bytes)
content_type – Type of content (‘text’ or ‘image’)
metadata – Additional metadata about the document

Bases: object

Class for preprocessing documents into text or images

Initialize document processor :param config: Document processing configuration (DocumentConfig) :param input_dir: Input directory (overrides config if given) :param file_pattern: File pattern (overrides config if given) :param output_type: Processing type (overrides config if given) :param output_dir: Output directory (overrides config if given)

__init__(config: DocumentConfig | None = None, input_dir: str | Path | None = None, file_pattern: str | None = None, output_type: str | None = None, output_dir: str | Path | None = None)[source]¶: Initialize document processor :param config: Document processing configuration (DocumentConfig) :param input_dir: Input directory (overrides config if given) :param file_pattern: File pattern (overrides config if given) :param output_type: Processing type (overrides config if given) :param output_dir: Output directory (overrides config if given)

process_to_text(file_path: str | Path) → ProcessedDocument[source]¶: Process document to text using DocumentLoader

process_to_image(file_path: str | Path) → List[ProcessedDocument][source]¶: Process document to list of page images

encode_image_base64(image_bytes: bytes) → str[source]¶: Encode image bytes to base64 string

process_file(file_path: str | Path) → ProcessedDocument | List[ProcessedDocument][source]¶

Process a single file based on configuration (text/image) :param file_path: Path to document

Returns:: Single ProcessedDocument for text or list of ProcessedDocument for images

process_directory() → List[ProcessedDocument | List[ProcessedDocument]][source]¶: Process all matching files in the configured directory

Synchronous Information Extractor module for extracting structured information from documents using LLM.

class dllmforge.IE_agent_extractor.DocumentChunk(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]¶

Bases: object

Class representing a chunk of document content

__init__(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]¶

class dllmforge.IE_agent_extractor.InfoExtractor(config: IEAgentConfig | None = None, output_schema: type[BaseModel] | None = None, llm_api: LangchainAPI | None = None, system_prompt: str | None = None, chunk_size: int | None = None, chunk_overlap: int | None = None, doc_processor: DocumentProcessor | None = None, document_output_type: str = 'text')[source]¶

Bases: object

Class for extracting information from documents using LLM

Initialize the information extractor.

You can use either config (IEAgentConfig), or pass the individual parameters directly.

Initialize the information extractor.

You can use either config (IEAgentConfig), or pass the individual parameters directly.

refine_system_prompt(task_description: str) → str[source]¶: Use LLM to refine user’s task description into a proper system prompt

chunk_document(doc: ProcessedDocument) → Generator[DocumentChunk, None, None][source]¶: Split document into chunks if needed based on thresholds

create_text_extraction_prompt() → ChatPromptTemplate[source]¶: Create prompt template for text-based information extraction

process_text_chunk(chunk: DocumentChunk) → Dict[str, Any] | None[source]¶: Process a text document chunk

create_image_extraction_prompt() → ChatPromptTemplate[source]¶: Create prompt template for image-based information extraction

process_image_chunk(chunk: DocumentChunk) → Dict[str, Any] | None[source]¶: Process an image document chunk

process_chunk(chunk: DocumentChunk) → Dict[str, Any] | None[source]¶: Process a document chunk based on its type

process_document(doc: ProcessedDocument | List[ProcessedDocument]) → List[Dict[str, Any]][source]¶: Process document and extract information, merging in chunk metadata.

save_results(results: List[Any], output_path: str | Path) → None[source]¶: Save extraction results to JSON file

process_all(save_individual: bool = False, combined_output_name: str = 'all_extracted.json') → None[source]¶

Process all documents in configured directory

Parameters:

save_individual – If True, save each document to a separate JSON file (old behavior)
combined_output_name – Name of the combined output file (default: “all_extracted.json”)

DLLMForge Documentation¶

Welcome to DLLMForge¶

Features¶

Repository Structure¶

Core Package (`dllmforge/`)¶

Workflows (`workflows/`)¶

Example Streamlit-based Application (`streamlit_apps/`)¶

Quick Start¶

Installation¶

Tutorials¶

Background Information¶

API Reference¶

API Reference¶

Modules¶

Indices and tables¶

dllmforge

Navigation

Related Topics

DLLMForge Documentation¶

Welcome to DLLMForge¶

Features¶

Repository Structure¶

Core Package (dllmforge/)¶

Workflows (workflows/)¶

Example Streamlit-based Application (streamlit_apps/)¶

Quick Start¶

Installation¶

Tutorials¶

Background Information¶

API Reference¶

API Reference¶

Modules¶

Indices and tables¶

Core Package (`dllmforge/`)¶

Workflows (`workflows/`)¶

Example Streamlit-based Application (`streamlit_apps/`)¶