index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html
![]()
本文介绍了如何构建一个高级的智能检索生成(RAG)系统。该系统通过模拟决策、选择检索策略和合成响应,实现了更智能、更自适应的信息检索。系统结合了嵌入、FAISS索引和模拟LLM,展示了如何通过智能决策将标准RAG流程提升为更高级的形式。
💡 该系统通过模拟决策、选择检索策略和合成响应,实现了更智能、更自适应的信息检索。
🔍 系统结合了嵌入、FAISS索引和模拟LLM,展示了如何通过智能决策将标准RAG流程提升为更高级的形式。
📊 系统支持多种检索策略,包括语义检索、多查询检索、时间检索和混合检索,以适应不同类型的问题。
🔄 通过智能决策,系统能够动态选择最合适的检索策略,提高检索的准确性和效率。
📈 该系统展示了在RAG中加入智能决策的潜力,使其在信息检索方面更智能、更目标导向,更接近人类的适应性。
In this tutorial, we walk through the implementation of an Agentic Retrieval-Augmented Generation (RAG) system. We design it so that the agent does more than just retrieve documents; it actively decides when retrieval is needed, selects the best retrieval strategy, and synthesizes responses with contextual awareness. By combining embeddings, FAISS indexing, and a mock LLM, we create a practical demonstration of how agentic decision-making can elevate the standard RAG pipeline into something more adaptive and intelligent. Check out the FULL CODES here.
import numpy as npimport faissfrom sentence_transformers import SentenceTransformerimport jsonimport refrom typing import List, Dict, Any, Optionalfrom dataclasses import dataclassfrom enum import Enumclass MockLLM: def generate(self, prompt: str, max_tokens: int = 150) -> str: prompt_lower = prompt.lower() if "decide whether to retrieve" in prompt_lower: if any(word in prompt_lower for word in ["specific", "recent", "data", "facts", "when", "who", "what"]): return "RETRIEVE: The query requires specific factual information that needs to be retrieved." else: return "NO_RETRIEVE: This is a general question that can be answered with existing knowledge." elif "choose retrieval strategy" in prompt_lower: if "comparison" in prompt_lower or "versus" in prompt_lower: return "STRATEGY: multi_query - Need to retrieve information about multiple entities for comparison." elif "recent" in prompt_lower or "latest" in prompt_lower: return "STRATEGY: temporal - Focus on recent information." else: return "STRATEGY: semantic - Standard semantic similarity search." elif "synthesize" in prompt_lower and "context:" in prompt_lower: return "Based on the retrieved information, here's a comprehensive answer that combines multiple sources and provides specific details with proper context." return "This is a mock response. In practice, use a real LLM like OpenAI's GPT or similar."class RetrievalStrategy(Enum): SEMANTIC = "semantic" MULTI_QUERY = "multi_query" TEMPORAL = "temporal" HYBRID = "hybrid"@dataclassclass Document: id: str content: str metadata: Dict[str, Any] embedding: Optional[np.ndarray] = None
We set up the foundation of our Agentic RAG system. We define a mock LLM to simulate decision-making, create a retrieval strategy enum, and design a Document dataclass so we can structure and manage our knowledge base efficiently. Check out the FULL CODES here.
class AgenticRAGSystem: def __init__(self, model_name: str = "all-MiniLM-L6-v2"): self.encoder = SentenceTransformer(model_name) self.llm = MockLLM() self.documents: List[Document] = [] self.index: Optional[faiss.Index] = None def add_documents(self, documents: List[Dict[str, Any]]) -> None: print(f"Processing {len(documents)} documents...") for i, doc in enumerate(documents): doc_obj = Document( id=doc.get('id', str(i)), content=doc['content'], metadata=doc.get('metadata', {}) ) self.documents.append(doc_obj) contents = [doc.content for doc in self.documents] embeddings = self.encoder.encode(contents, show_progress_bar=True) for doc, embedding in zip(self.documents, embeddings): doc.embedding = embedding dimension = embeddings.shape[1] self.index = faiss.IndexFlatIP(dimension) faiss.normalize_L2(embeddings) self.index.add(embeddings.astype('float32')) print(f"Knowledge base built with {len(self.documents)} documents")
We build the core of our Agentic RAG system. We initialize the embedding model, set up the FAISS index, and add documents by encoding their contents into vectors, enabling fast and accurate semantic retrieval from our knowledge base. Check out the FULL CODES here.
def decide_retrieval(self, query: str) -> bool: decision_prompt = f""" Analyze the following query and decide whether to retrieve information: Query: "{query}" Decide whether to retrieve information from the knowledge base. Consider if this needs specific facts, recent data, or can be answered generally. Respond with either: RETRIEVE: [reason] or NO_RETRIEVE: [reason] """ response = self.llm.generate(decision_prompt) should_retrieve = response.startswith("RETRIEVE:") print(f" Agent Decision: {'Retrieve' if should_retrieve else 'Direct Answer'}") print(f" Reasoning: {response.split(':', 1)[1].strip() if ':' in response else response}") return should_retrieve def choose_strategy(self, query: str) -> RetrievalStrategy: strategy_prompt = f""" Choose the best retrieval strategy for this query: Query: "{query}" Available strategies: - semantic: Standard similarity search - multi_query: Multiple related queries (for comparisons) - temporal: Focus on recent information - hybrid: Combination approach Choose retrieval strategy and explain why. Respond with: STRATEGY: [strategy_name] - [reasoning] """ response = self.llm.generate(strategy_prompt) if "multi_query" in response.lower(): strategy = RetrievalStrategy.MULTI_QUERY elif "temporal" in response.lower(): strategy = RetrievalStrategy.TEMPORAL elif "hybrid" in response.lower(): strategy = RetrievalStrategy.HYBRID else: strategy = RetrievalStrategy.SEMANTIC print(f" Retrieval Strategy: {strategy.value}") print(f" Reasoning: {response.split('-', 1)[1].strip() if '-' in response else response}") return strategy
We give our agent the ability to think before it fetches. We first determine if a query truly requires retrieval, then we select the most suitable strategy: semantic, multi-query, temporal, or hybrid. This allows us to target the correct context with clear, printed reasoning for each step. Check out the FULL CODES here.
def retrieve_documents(self, query: str, strategy: RetrievalStrategy, k: int = 3) -> List[Document]: if not self.index: print(" No knowledge base available") return [] if strategy == RetrievalStrategy.MULTI_QUERY: queries = [query, f"advantages of {query}", f"disadvantages of {query}"] all_docs = [] for q in queries: docs = self._semantic_search(q, k=2) all_docs.extend(docs) seen_ids = set() unique_docs = [] for doc in all_docs: if doc.id not in seen_ids: unique_docs.append(doc) seen_ids.add(doc.id) return unique_docs[:k] elif strategy == RetrievalStrategy.TEMPORAL: docs = self._semantic_search(query, k=k*2) docs_with_dates = [(doc, doc.metadata.get('date', '1900-01-01')) for doc in docs] docs_with_dates.sort(key=lambda x: x[1], reverse=True) return [doc for doc, _ in docs_with_dates[:k]] else: return self._semantic_search(query, k=k) def _semantic_search(self, query: str, k: int) -> List[Document]: query_embedding = self.encoder.encode([query]) faiss.normalize_L2(query_embedding) scores, indices = self.index.search(query_embedding.astype('float32'), k) results = [] for score, idx in zip(scores[0], indices[0]): if idx < len(self.documents): results.append(self.documents[idx]) return results def synthesize_response(self, query: str, retrieved_docs: List[Document]) -> str: if not retrieved_docs: return self.llm.generate(f"Answer this query: {query}") context = "\n\n".join([f"Document {i+1}: {doc.content}" for i, doc in enumerate(retrieved_docs)]) synthesis_prompt = f""" Query: {query} Context: {context} Synthesize a comprehensive answer using the provided context. Be specific and reference the information sources when relevant. """ return self.llm.generate(synthesis_prompt, max_tokens=200)
We implement how we actually fetch and use knowledge. We perform semantic search, branch into multi-query or temporal re-ranking when needed, deduplicate results, and then synthesize a focused answer from the retrieved context. In doing so, we maintain efficient, transparent, and tightly aligned retrieval. Check out the FULL CODES here.
def query(self, query: str) -> str: print(f"\n Processing Query: '{query}'") print("=" * 50) if not self.decide_retrieval(query): print("\n Generating direct response...") return self.llm.generate(f"Answer this query: {query}") strategy = self.choose_strategy(query) print(f"\n Retrieving documents using {strategy.value} strategy...") retrieved_docs = self.retrieve_documents(query, strategy) print(f" Retrieved {len(retrieved_docs)} documents") print("\n Synthesizing response...") response = self.synthesize_response(query, retrieved_docs) if retrieved_docs: print("\n Retrieved Context:") for i, doc in enumerate(retrieved_docs[:2], 1): print(f" {i}. {doc.content[:100]}...") return response
We bring all the parts together into a single pipeline. When we run a query, we first determine if retrieval is necessary, then select the appropriate strategy, fetch documents accordingly, and finally synthesize a response while also displaying the retrieved context for transparency. This makes the system feel more agentic and explainable. Check out the FULL CODES here.
def create_sample_knowledge_base(): return [ { "id": "ai_1", "content": "Artificial Intelligence (AI) refers to computer systems that can perform tasks requiring human intelligence", "metadata": {"topic": "AI basics", "date": "2024-01-15"} }, { "id": "ml_1", "content": "ML is a subset of AI.", "metadata": {"topic": "Machine Learning", "date": "2024-02-10"} }, { "id": "rag_1", "content": "Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval to provide more accurate and up-to-date responses.", "metadata": {"topic": "RAG", "date": "2024-03-05"} }, { "id": "agents_1", "content": "AI agents", "metadata": {"topic": "AI Agents", "date": "2024-03-20"} } ]if __name__ == "__main__": print(" Initializing Agentic RAG System...") rag_system = AgenticRAGSystem() docs = create_sample_knowledge_base() rag_system.add_documents(docs) demo_queries = [ "What is artificial intelligence?", "How are you today?", "Compare AI and Machine Learning", ] for query in demo_queries: response = rag_system.query(query) print(f"\n Final Response: {response}") print("\n" + "="*80) print("\n Agentic RAG Tutorial Complete!") print("\nKey Features Demonstrated:") print("• Agent-driven retrieval decisions") print("• Dynamic strategy selection") print("• Multi-modal retrieval approaches") print("• Transparent reasoning process")
We wrap everything into a runnable demo. We create a small knowledge base of AI-related documents, initialize the Agentic RAG system, and run sample queries that highlight various behaviors, including retrieval, direct answering, and comparison. This final block ties the whole tutorial together and showcases the agent’s reasoning in action.
In conclusion, we see how agent-driven retrieval decisions, dynamic strategy selection, and transparent reasoning come together to form an advanced Agentic RAG workflow. We now have a working system that highlights the potential of adding agency to RAG, making information retrieval smarter, more targeted, and more human-like in its adaptability. This foundation allows us to extend the system with real LLMs, larger knowledge bases, and more sophisticated strategies in future iterations.
Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval? appeared first on MarkTechPost.