DLLMForge Documentation

Welcome to DLLMForge

DLLMForge is a repository for LLM (Large Language Model) tools developed at Deltares. It provides simple open and closed source tools to interact with various with D-LLMForge you can :

  • Use an simple LLM to ask questions.

  • Build your RAG with HuggingFace or AZURE embeddings and vector stores.

  • Create agents that can use tools to answer complex questions.

  • Extract structured information from documents using LLMs.

Features

DLLMForge provides a modular toolkit for:

  • Multi-LLM Support: Integration with OpenAI, Anthropic, and open-source Deltares hosted models

  • RAG Pipeline: Complete document ingestion, embedding, and retrieval system

  • Agent Framework: Simple but extensible agent architecture with tool support

  • Evaluation Tools: Comprehensive RAG system evaluation using various metrics

  • Flexible Backends: Support for both cloud (Azure, OpenAI) and local deployments

Repository Structure

DLLMForge is organized into several key components that work together to provide a comprehensive LLM toolkit:

Core Package (dllmforge/)

The main package contains the following modules:

Core Agent Framework
Information Extraction Framework
  • IE_agent_config.py - Configuration management for IE agents

  • IE_agent_schema_generator.py - Automatic schema generation for structured extraction

  • IE_agent_document_processor.py - Document processing for information extraction

  • IE_agent_extractor.py - Main information extraction orchestrator

  • IE_agent_extractor_docling.py - Enhanced extraction with Docling preprocessing

LLM API Integrations
  • openai_api.py - OpenAI API integration

  • anthropic_api.py - Anthropic Claude API integration

  • langchain_api.py - LangChain framework integration

  • llamaindex_api.py - LlamaIndex framework integration

RAG (Retrieval-Augmented Generation) Components
Specialized Components
  • LLMs/Deltares_LLMs.py - Deltares-specific LLM implementations

  • utils/ - Utility functions and helpers

Workflows (workflows/)

  • open_source_RAG.py - Example workflow for open-source RAG implementation

Example Streamlit-based Application (streamlit_apps/)

  • app.py - Streamlit-based RAG application

  • streamlit_water_management_app.py - Streamlit-based water management application

Quick Start

Installation

To install DLLMForge, you can use pip:

pip install git+https://github.com/Deltares-research/DLLMForge

Tutorials

The following tutorials are available:

Background Information

For more information on LLMs and RAG systems, see:

API Reference

API Reference

dllmforge

DLLMForge - Deltares LLM Forge Toolkit

Modules

Simple agent core for DLLMForge - Clean LangGraph utilities.

This module provides simple, elegant utilities for creating LangGraph agents following the pattern established in water_management_agent_simple.py.

dllmforge.agent_core.tool(func)[source]

DLLMForge wrapper around LangChain’s @tool decorator.

This decorator provides a consistent interface for creating tools within the DLLMForge ecosystem while maintaining compatibility with LangChain’s tool system.

Parameters:

func – Function to be converted into a tool

Returns:

Tool function that can be used with SimpleAgent

class dllmforge.agent_core.SimpleAgent(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai', llm=None, enable_text_tool_routing: bool = False, max_tool_iterations: int = 3)[source]

Bases: object

Simple agent class for LangGraph workflows.

Initialize a simple LangGraph agent.

Parameters:
  • system_message – System message for the agent

  • temperature – LLM temperature setting

  • model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)

__init__(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai', llm=None, enable_text_tool_routing: bool = False, max_tool_iterations: int = 3)[source]

Initialize a simple LangGraph agent.

Parameters:
  • system_message – System message for the agent

  • temperature – LLM temperature setting

  • model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)

add_tool(tool_func: Callable) None[source]

Add a tool to the agent.

Parameters:

tool_func – Function decorated with @tool

add_node(name: str, func: Callable) None[source]

Add a node to the workflow.

Parameters:
  • name – Node name

  • func – Node function

add_edge(from_node: str, to_node: str) None[source]

Add a simple edge between nodes.

Parameters:
  • from_node – Source node

  • to_node – Target node

add_conditional_edge(from_node: str, condition_func: Callable) None[source]

Add a conditional edge.

Parameters:
  • from_node – Source node

  • condition_func – Function that determines routing

create_simple_workflow() None[source]

Create a simple agent -> tools workflow with optional text-based tool routing.

compile(checkpointer=None) None[source]

Compile the workflow.

process_query(query: str, stream: bool = True) None[source]

Process a query with the agent.

Parameters:
  • query – User query

  • stream – Whether to stream the response

run_interactive() None[source]

Run the agent in interactive mode.

dllmforge.agent_core.create_basic_agent(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai') SimpleAgent[source]

Create a basic agent with standard setup.

Parameters:
  • system_message – System message for the agent

  • temperature – LLM temperature

  • model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)

Returns:

Configured agent instance

Return type:

SimpleAgent

dllmforge.agent_core.create_echo_tool()[source]

Create a simple echo tool for testing.

dllmforge.agent_core.create_basic_tools() List[Callable][source]

Create basic utility tools for testing.

Returns:

List of tool functions

Schema Generator module for automatically generating Pydantic models based on user descriptions and example documents using LLM.

class dllmforge.IE_agent_schema_generator.PythonCodeOutputParser(*args: Any, name: str | None = None)[source]

Bases: BaseOutputParser[str]

Parse Python code from LLM responses that may contain markdown.

parse(text: str) str[source]

Parse the output of an LLM call to extract Python code.

get_format_instructions() str[source]

Instructions on how the LLM output should be formatted.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str | None

The name of the Runnable. Used for debugging and tracing.

class dllmforge.IE_agent_schema_generator.SchemaGenerator(config: SchemaConfig | None = None, llm_api: LangchainAPI | None = None, task_description: str | None = None, example_doc: str | None = None, user_schema_path: Path | None = None, output_path: str | Path | None = None)[source]

Bases: object

Class for generating Pydantic schemas using LLM

This class supports two usage modes:

  1. CONFIG MODE: Pass a SchemaConfig object ```python config = SchemaConfig(

    task_description=”Extract person info”, output_path=”schema.py”

    ) generator = SchemaGenerator(config=config) ```

  2. DIRECT MODE: Pass arguments directly (no config object) ```python generator = SchemaGenerator(

    task_description=”Extract person info”, output_path=”schema.py”

Both modes support all parameters: - task_description (REQUIRED in direct mode) - example_doc (optional: text or file path) - user_schema_path (optional: load existing schema) - output_path (optional: where to save generated schema) - llm_api (optional: custom LLM configuration)

Initialize the schema generator.

You can use either config (SchemaConfig), or pass the individual parameters directly.

Parameters:
  • config – Schema generation configuration (if provided, individual params are ignored)

  • llm_api – Optional pre-configured LangchainAPI instance

  • task_description – Description of the information extraction task (direct mode)

  • example_doc – Example document to help with schema generation (direct mode)

  • user_schema_path – Path to user-provided schema Python file (direct mode)

  • output_path – Path to save generated schema (direct mode)

__init__(config: SchemaConfig | None = None, llm_api: LangchainAPI | None = None, task_description: str | None = None, example_doc: str | None = None, user_schema_path: Path | None = None, output_path: str | Path | None = None)[source]

Initialize the schema generator.

You can use either config (SchemaConfig), or pass the individual parameters directly.

Parameters:
  • config – Schema generation configuration (if provided, individual params are ignored)

  • llm_api – Optional pre-configured LangchainAPI instance

  • task_description – Description of the information extraction task (direct mode)

  • example_doc – Example document to help with schema generation (direct mode)

  • user_schema_path – Path to user-provided schema Python file (direct mode)

  • output_path – Path to save generated schema (direct mode)

setup_parser()[source]

Setup the Pydantic output parser for structured verification results

create_schema_generation_prompt() ChatPromptTemplate[source]

Create prompt template for generating Pydantic schema

generate_schema() str[source]

Generate Pydantic schema based on task description and optional example document

save_schema(schema_code: str) None[source]

Save generated schema to a Python file

Document Processor module for preprocessing documents into text or images for LLM processing.

class dllmforge.IE_agent_document_processor.ProcessedDocument(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]

Bases: object

Class representing processed document content

Initialize processed document

Parameters:
  • content – The document content (text string or image bytes)

  • content_type – Type of content (‘text’ or ‘image’)

  • metadata – Additional metadata about the document

__init__(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]

Initialize processed document

Parameters:
  • content – The document content (text string or image bytes)

  • content_type – Type of content (‘text’ or ‘image’)

  • metadata – Additional metadata about the document

class dllmforge.IE_agent_document_processor.DocumentProcessor(config: DocumentConfig | None = None, input_dir: str | Path | None = None, file_pattern: str | None = None, output_type: str | None = None, output_dir: str | Path | None = None)[source]

Bases: object

Class for preprocessing documents into text or images

Initialize document processor :param config: Document processing configuration (DocumentConfig) :param input_dir: Input directory (overrides config if given) :param file_pattern: File pattern (overrides config if given) :param output_type: Processing type (overrides config if given) :param output_dir: Output directory (overrides config if given)

__init__(config: DocumentConfig | None = None, input_dir: str | Path | None = None, file_pattern: str | None = None, output_type: str | None = None, output_dir: str | Path | None = None)[source]

Initialize document processor :param config: Document processing configuration (DocumentConfig) :param input_dir: Input directory (overrides config if given) :param file_pattern: File pattern (overrides config if given) :param output_type: Processing type (overrides config if given) :param output_dir: Output directory (overrides config if given)

process_to_text(file_path: str | Path) ProcessedDocument[source]

Process document to text using DocumentLoader

process_to_image(file_path: str | Path) List[ProcessedDocument][source]

Process document to list of page images

encode_image_base64(image_bytes: bytes) str[source]

Encode image bytes to base64 string

process_file(file_path: str | Path) ProcessedDocument | List[ProcessedDocument][source]

Process a single file based on configuration (text/image) :param file_path: Path to document

Returns:

Single ProcessedDocument for text or list of ProcessedDocument for images

process_directory() List[ProcessedDocument | List[ProcessedDocument]][source]

Process all matching files in the configured directory

Synchronous Information Extractor module for extracting structured information from documents using LLM.

class dllmforge.IE_agent_extractor.DocumentChunk(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]

Bases: object

Class representing a chunk of document content

__init__(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]
class dllmforge.IE_agent_extractor.InfoExtractor(config: IEAgentConfig | None = None, output_schema: type[BaseModel] | None = None, llm_api: LangchainAPI | None = None, system_prompt: str | None = None, chunk_size: int | None = None, chunk_overlap: int | None = None, doc_processor: DocumentProcessor | None = None, document_output_type: str = 'text')[source]

Bases: object

Class for extracting information from documents using LLM

Initialize the information extractor.

You can use either config (IEAgentConfig), or pass the individual parameters directly.

__init__(config: IEAgentConfig | None = None, output_schema: type[BaseModel] | None = None, llm_api: LangchainAPI | None = None, system_prompt: str | None = None, chunk_size: int | None = None, chunk_overlap: int | None = None, doc_processor: DocumentProcessor | None = None, document_output_type: str = 'text')[source]

Initialize the information extractor.

You can use either config (IEAgentConfig), or pass the individual parameters directly.

refine_system_prompt(task_description: str) str[source]

Use LLM to refine user’s task description into a proper system prompt

chunk_document(doc: ProcessedDocument) Generator[DocumentChunk, None, None][source]

Split document into chunks if needed based on thresholds

create_text_extraction_prompt() ChatPromptTemplate[source]

Create prompt template for text-based information extraction

process_text_chunk(chunk: DocumentChunk) Dict[str, Any] | None[source]

Process a text document chunk

create_image_extraction_prompt() ChatPromptTemplate[source]

Create prompt template for image-based information extraction

process_image_chunk(chunk: DocumentChunk) Dict[str, Any] | None[source]

Process an image document chunk

process_chunk(chunk: DocumentChunk) Dict[str, Any] | None[source]

Process a document chunk based on its type

process_document(doc: ProcessedDocument | List[ProcessedDocument]) List[Dict[str, Any]][source]

Process document and extract information, merging in chunk metadata.

save_results(results: List[Any], output_path: str | Path) None[source]

Save extraction results to JSON file

process_all(save_individual: bool = False, combined_output_name: str = 'all_extracted.json') None[source]

Process all documents in configured directory

Parameters:
  • save_individual – If True, save each document to a separate JSON file (old behavior)

  • combined_output_name – Name of the combined output file (default: “all_extracted.json”)

Indices and tables