DLLMForge Documentation¶
Welcome to DLLMForge¶
DLLMForge is a repository for LLM (Large Language Model) tools developed at Deltares. It provides simple open and closed source tools to interact with various with D-LLMForge you can :
Use an simple LLM to ask questions.
Build your RAG with HuggingFace or AZURE embeddings and vector stores.
Create agents that can use tools to answer complex questions.
Extract structured information from documents using LLMs.
Features¶
DLLMForge provides a modular toolkit for:
Multi-LLM Support: Integration with OpenAI, Anthropic, and open-source Deltares hosted models
RAG Pipeline: Complete document ingestion, embedding, and retrieval system
Agent Framework: Simple but extensible agent architecture with tool support
Evaluation Tools: Comprehensive RAG system evaluation using various metrics
Flexible Backends: Support for both cloud (Azure, OpenAI) and local deployments
Repository Structure¶
DLLMForge is organized into several key components that work together to provide a comprehensive LLM toolkit:
Core Package (dllmforge/)¶
The main package contains the following modules:
- Core Agent Framework
agent_core.py- Simple agent infrastructure with tool supportSimpleAgent- Basic agentic workflowscreate_basic_agent()- Agent factory functioncreate_basic_tools()- Tool creation utilities
- Information Extraction Framework
IE_agent_config.py- Configuration management for IE agentsIEAgentConfig- Main configuration classSchemaConfig- Schema generation configurationDocumentConfig- Document processing configurationExtractorConfig- Information extraction configuration
IE_agent_schema_generator.py- Automatic schema generation for structured extractionSchemaGenerator- Generate Pydantic schemas from task descriptions
IE_agent_document_processor.py- Document processing for information extractionDocumentProcessor- Convert documents to LLM-readable formatProcessedDocument- Processed document container
IE_agent_extractor.py- Main information extraction orchestratorInfoExtractor- Extract structured information from documentsDocumentChunk- Document chunk container
IE_agent_extractor_docling.py- Enhanced extraction with Docling preprocessingDoclingInfoExtractor- Advanced document structure-aware extraction
- LLM API Integrations
openai_api.py- OpenAI API integrationOpenAIAPI- OpenAI API wrapper
anthropic_api.py- Anthropic Claude API integrationAnthropicAPI- Anthropic API wrapper
langchain_api.py- LangChain framework integrationllamaindex_api.py- LlamaIndex framework integrationLlamaIndexAPI- LlamaIndex API wrapper
- RAG (Retrieval-Augmented Generation) Components
rag_preprocess_documents.py- Document loading and chunkingDocumentLoader- Abstract document loaderPDFLoader- Load PDF documentsTextChunker- Split text into manageable chunks with overlap
rag_embedding.py- Azure OpenAI embedding modelsAzureOpenAIEmbeddingModel- Generate embeddings for text
rag_embedding_open_source.py- Open-source embedding modelsLangchainHFEmbeddingModel- HuggingFace embeddings via LangChain
rag_search_and_response.py- Search and response generationIndexManager- Manage vector indicesRetriever- Retrieve relevant documentsLLMResponder- Generate responses using LLMs
rag_evaluation.py- RAG system evaluationRAGEvaluator- Evaluate RAG system performanceEvaluationResult- Store individual evaluation metricsRAGEvaluationResult- Store comprehensive RAG evaluation results
- Specialized Components
LLMs/Deltares_LLMs.py- Deltares-specific LLM implementationsutils/- Utility functions and helpers
Workflows (workflows/)¶
open_source_RAG.py- Example workflow for open-source RAG implementation
Example Streamlit-based Application (streamlit_apps/)¶
app.py- Streamlit-based RAG application
streamlit_water_management_app.py- Streamlit-based water management application
Quick Start¶
Installation¶
To install DLLMForge, you can use pip:
pip install git+https://github.com/Deltares-research/DLLMForge
Tutorials¶
The following tutorials are available:
Background Information¶
For more information on LLMs and RAG systems, see:
API Reference¶
Tutorials:
- Tutorial LLM capabilities of DLLMForge
- Open Source RAG Pipeline Tutorial
- Building a Simple Agent with DLLMForge
- Tutorial: Advanced Water Management Agent
- Learning Objectives
- Core Concepts Demonstrated
- Workflow Overview
- Water Calculation Tools
- Information Retrieval Tools
- Conditional Edge Implementation
- Workflow Assembly
- Testing the Workflow
- Additional Examples
- Running the Tutorial
- Testing with Custom Queries
- Key Benefits for Water Professionals
- Next Steps
- Information Extraction with LLMs Tutorial
API Reference¶
DLLMForge - Deltares LLM Forge Toolkit |
Modules¶
Simple agent core for DLLMForge - Clean LangGraph utilities.
This module provides simple, elegant utilities for creating LangGraph agents following the pattern established in water_management_agent_simple.py.
- dllmforge.agent_core.tool(func)[source]¶
DLLMForge wrapper around LangChain’s @tool decorator.
This decorator provides a consistent interface for creating tools within the DLLMForge ecosystem while maintaining compatibility with LangChain’s tool system.
- Parameters:
func – Function to be converted into a tool
- Returns:
Tool function that can be used with SimpleAgent
- class dllmforge.agent_core.SimpleAgent(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai', llm=None, enable_text_tool_routing: bool = False, max_tool_iterations: int = 3)[source]¶
Bases:
objectSimple agent class for LangGraph workflows.
Initialize a simple LangGraph agent.
- Parameters:
system_message – System message for the agent
temperature – LLM temperature setting
model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)
- __init__(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai', llm=None, enable_text_tool_routing: bool = False, max_tool_iterations: int = 3)[source]¶
Initialize a simple LangGraph agent.
- Parameters:
system_message – System message for the agent
temperature – LLM temperature setting
model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)
- add_tool(tool_func: Callable) None[source]¶
Add a tool to the agent.
- Parameters:
tool_func – Function decorated with @tool
- add_node(name: str, func: Callable) None[source]¶
Add a node to the workflow.
- Parameters:
name – Node name
func – Node function
- add_edge(from_node: str, to_node: str) None[source]¶
Add a simple edge between nodes.
- Parameters:
from_node – Source node
to_node – Target node
- add_conditional_edge(from_node: str, condition_func: Callable) None[source]¶
Add a conditional edge.
- Parameters:
from_node – Source node
condition_func – Function that determines routing
- create_simple_workflow() None[source]¶
Create a simple agent -> tools workflow with optional text-based tool routing.
- dllmforge.agent_core.create_basic_agent(system_message: str = None, temperature: float = 0.1, model_provider: str = 'azure-openai') SimpleAgent[source]¶
Create a basic agent with standard setup.
- Parameters:
system_message – System message for the agent
temperature – LLM temperature
model_provider – LLM provider (“azure-openai”, “openai”, “mistral”)
- Returns:
Configured agent instance
- Return type:
- dllmforge.agent_core.create_basic_tools() List[Callable][source]¶
Create basic utility tools for testing.
- Returns:
List of tool functions
Schema Generator module for automatically generating Pydantic models based on user descriptions and example documents using LLM.
- class dllmforge.IE_agent_schema_generator.PythonCodeOutputParser(*args: Any, name: str | None = None)[source]¶
Bases:
BaseOutputParser[str]Parse Python code from LLM responses that may contain markdown.
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'protected_namespaces': ()}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str | None¶
The name of the
Runnable. Used for debugging and tracing.
- class dllmforge.IE_agent_schema_generator.SchemaGenerator(config: SchemaConfig | None = None, llm_api: LangchainAPI | None = None, task_description: str | None = None, example_doc: str | None = None, user_schema_path: Path | None = None, output_path: str | Path | None = None)[source]¶
Bases:
objectClass for generating Pydantic schemas using LLM
This class supports two usage modes:
CONFIG MODE: Pass a SchemaConfig object ```python config = SchemaConfig(
task_description=”Extract person info”, output_path=”schema.py”
DIRECT MODE: Pass arguments directly (no config object) ```python generator = SchemaGenerator(
task_description=”Extract person info”, output_path=”schema.py”
Both modes support all parameters: - task_description (REQUIRED in direct mode) - example_doc (optional: text or file path) - user_schema_path (optional: load existing schema) - output_path (optional: where to save generated schema) - llm_api (optional: custom LLM configuration)
Initialize the schema generator.
You can use either config (SchemaConfig), or pass the individual parameters directly.
- Parameters:
config – Schema generation configuration (if provided, individual params are ignored)
llm_api – Optional pre-configured LangchainAPI instance
task_description – Description of the information extraction task (direct mode)
example_doc – Example document to help with schema generation (direct mode)
user_schema_path – Path to user-provided schema Python file (direct mode)
output_path – Path to save generated schema (direct mode)
- __init__(config: SchemaConfig | None = None, llm_api: LangchainAPI | None = None, task_description: str | None = None, example_doc: str | None = None, user_schema_path: Path | None = None, output_path: str | Path | None = None)[source]¶
Initialize the schema generator.
You can use either config (SchemaConfig), or pass the individual parameters directly.
- Parameters:
config – Schema generation configuration (if provided, individual params are ignored)
llm_api – Optional pre-configured LangchainAPI instance
task_description – Description of the information extraction task (direct mode)
example_doc – Example document to help with schema generation (direct mode)
user_schema_path – Path to user-provided schema Python file (direct mode)
output_path – Path to save generated schema (direct mode)
- create_schema_generation_prompt() ChatPromptTemplate[source]¶
Create prompt template for generating Pydantic schema
Document Processor module for preprocessing documents into text or images for LLM processing.
- class dllmforge.IE_agent_document_processor.ProcessedDocument(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]¶
Bases:
objectClass representing processed document content
Initialize processed document
- Parameters:
content – The document content (text string or image bytes)
content_type – Type of content (‘text’ or ‘image’)
metadata – Additional metadata about the document
- __init__(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]¶
Initialize processed document
- Parameters:
content – The document content (text string or image bytes)
content_type – Type of content (‘text’ or ‘image’)
metadata – Additional metadata about the document
- class dllmforge.IE_agent_document_processor.DocumentProcessor(config: DocumentConfig | None = None, input_dir: str | Path | None = None, file_pattern: str | None = None, output_type: str | None = None, output_dir: str | Path | None = None)[source]¶
Bases:
objectClass for preprocessing documents into text or images
Initialize document processor :param config: Document processing configuration (DocumentConfig) :param input_dir: Input directory (overrides config if given) :param file_pattern: File pattern (overrides config if given) :param output_type: Processing type (overrides config if given) :param output_dir: Output directory (overrides config if given)
- __init__(config: DocumentConfig | None = None, input_dir: str | Path | None = None, file_pattern: str | None = None, output_type: str | None = None, output_dir: str | Path | None = None)[source]¶
Initialize document processor :param config: Document processing configuration (DocumentConfig) :param input_dir: Input directory (overrides config if given) :param file_pattern: File pattern (overrides config if given) :param output_type: Processing type (overrides config if given) :param output_dir: Output directory (overrides config if given)
- process_to_text(file_path: str | Path) ProcessedDocument[source]¶
Process document to text using DocumentLoader
- process_to_image(file_path: str | Path) List[ProcessedDocument][source]¶
Process document to list of page images
- process_file(file_path: str | Path) ProcessedDocument | List[ProcessedDocument][source]¶
Process a single file based on configuration (text/image) :param file_path: Path to document
- Returns:
Single ProcessedDocument for text or list of ProcessedDocument for images
- process_directory() List[ProcessedDocument | List[ProcessedDocument]][source]¶
Process all matching files in the configured directory
Synchronous Information Extractor module for extracting structured information from documents using LLM.
- class dllmforge.IE_agent_extractor.DocumentChunk(content: str | bytes, content_type: str, metadata: Dict[str, Any] | None = None)[source]¶
Bases:
objectClass representing a chunk of document content
- class dllmforge.IE_agent_extractor.InfoExtractor(config: IEAgentConfig | None = None, output_schema: type[BaseModel] | None = None, llm_api: LangchainAPI | None = None, system_prompt: str | None = None, chunk_size: int | None = None, chunk_overlap: int | None = None, doc_processor: DocumentProcessor | None = None, document_output_type: str = 'text')[source]¶
Bases:
objectClass for extracting information from documents using LLM
Initialize the information extractor.
You can use either config (IEAgentConfig), or pass the individual parameters directly.
- __init__(config: IEAgentConfig | None = None, output_schema: type[BaseModel] | None = None, llm_api: LangchainAPI | None = None, system_prompt: str | None = None, chunk_size: int | None = None, chunk_overlap: int | None = None, doc_processor: DocumentProcessor | None = None, document_output_type: str = 'text')[source]¶
Initialize the information extractor.
You can use either config (IEAgentConfig), or pass the individual parameters directly.
- refine_system_prompt(task_description: str) str[source]¶
Use LLM to refine user’s task description into a proper system prompt
- chunk_document(doc: ProcessedDocument) Generator[DocumentChunk, None, None][source]¶
Split document into chunks if needed based on thresholds
- create_text_extraction_prompt() ChatPromptTemplate[source]¶
Create prompt template for text-based information extraction
- process_text_chunk(chunk: DocumentChunk) Dict[str, Any] | None[source]¶
Process a text document chunk
- create_image_extraction_prompt() ChatPromptTemplate[source]¶
Create prompt template for image-based information extraction
- process_image_chunk(chunk: DocumentChunk) Dict[str, Any] | None[source]¶
Process an image document chunk
- process_chunk(chunk: DocumentChunk) Dict[str, Any] | None[source]¶
Process a document chunk based on its type
- process_document(doc: ProcessedDocument | List[ProcessedDocument]) List[Dict[str, Any]][source]¶
Process document and extract information, merging in chunk metadata.
- save_results(results: List[Any], output_path: str | Path) None[source]¶
Save extraction results to JSON file
- process_all(save_individual: bool = False, combined_output_name: str = 'all_extracted.json') None[source]¶
Process all documents in configured directory
- Parameters:
save_individual – If True, save each document to a separate JSON file (old behavior)
combined_output_name – Name of the combined output file (default: “all_extracted.json”)