dllmforge.rag_search_and_response

This module provides “create index/vector-database”, “search” and “response” functionality for RAG (Retrieval-Augmented Generation) pipelines. Three steps are involved: 1. Create index/vector-database: create an index/vector-database on Azure AI search service. 1. Search: use Azure AI search service to retrieve relevant chunks from the vector database . 2. Response: use LLMs to generate a response to the user query based on the retrieved chunks. The module uses Azure AI search service and Azure OpenAI service as an example of using hosted search APIs and LLMs APIs. Note you need Azure AI search service, Azure OpenAI service and a deployed LLM model on Azure to use this module.

The example demonstrates the whole pipeline of RAG, including: 1. Preprocess the documents to chunks. 2. Vectorize the chunks. 3. Create vector index and store the chunks in the vector database. 4. Search the vector database for relevant chunks. 5. Generate a response to the user query based on the retrieved chunks.

Classes

IndexManager([search_client_endpoint, ...])

LLMResponder(llm)

Retriever(embedding_model[, index_name, ...])

class dllmforge.rag_search_and_response.IndexManager(search_client_endpoint=None, search_api_key=None, index_name=None, embedding_dim=None)[source]
__init__(search_client_endpoint=None, search_api_key=None, index_name=None, embedding_dim=None)[source]
create_index(api_base=None, deployment_name_embeddings=None, api_key=None)[source]
upload_documents(vectorized_chunks)[source]
class dllmforge.rag_search_and_response.Retriever(embedding_model, index_name=None, search_client_endpoint=None, search_api_key=None)[source]
__init__(embedding_model, index_name=None, search_client_endpoint=None, search_api_key=None)[source]
get_embeddings(text)[source]
invoke(query_text, top_k=5)[source]
class dllmforge.rag_search_and_response.LLMResponder(llm)[source]
__init__(llm)[source]
augment_prompt_with_context(query_text, chunks)[source]
generate(query_text, retrieved_chunks)[source]