Advanced RAG Patterns

Pattern 1: GraphRAG (Knowledge Graphs + RAG)

Problem: Simple RAG struggles with multi-hop reasoning. Example query: “Which employees who worked on Project X are now working on Project Y?”

Requires connecting: Employee → Project X → Employee → Project Y
Basic RAG might find docs about each project separately but miss the connection

Solution: Use knowledge graph to find connected entities, then retrieve documents.

# Simplified GraphRAG pattern (using NetworkX)
import networkx as nx
from typing import List, Set

class GraphRAG:
    def __init__(self, knowledge_graph: nx.Graph, document_store):
        self.kg = knowledge_graph
        self.docs = document_store
    
    def graph_guided_retrieval(
        self,
        entities: List[str],
        max_hops: int = 2
    ) -> List[str]:
        """
        Find documents for entities and their connections.
        
        Args:
            entities: Starting entities (e.g., ["Project X", "Project Y"])
            max_hops: How many relationships to traverse
        
        Returns:
            Document IDs relevant to entity subgraph
        """
        # Step 1: Find subgraph around entities
        relevant_nodes = set(entities)
        
        for entity in entities:
            # Get nodes within max_hops
            if entity in self.kg:
                neighbors = nx.single_source_shortest_path_length(
                    self.kg, entity, cutoff=max_hops
                )
                relevant_nodes.update(neighbors.keys())
        
        # Step 2: Retrieve documents mentioning any relevant node
        doc_ids = set()
        for node in relevant_nodes:
            # Look up which documents mention this entity
            if node in self.kg.nodes[node].get('document_ids', []):
                doc_ids.update(self.kg.nodes[node]['document_ids'])
        
        return list(doc_ids)
    
    def query(self, question: str, entities: List[str]) -> str:
        """GraphRAG query: graph traversal → document retrieval → generation."""
        # Step 1: Graph-guided retrieval
        doc_ids = self.graph_guided_retrieval(entities, max_hops=2)
        
        # Step 2: Fetch actual documents
        documents = [self.docs.get(doc_id) for doc_id in doc_ids]
        context = "\n\n".join(documents)
        
        # Step 3: Generate answer
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }]
        )
        
        return response.choices[0].message.content

# Example usage
kg = nx.Graph()
kg.add_edge("Alice", "Project X", type="works_on")
kg.add_edge("Bob", "Project X", type="works_on")
kg.add_edge("Alice", "Project Y", type="works_on")

graph_rag = GraphRAG(kg, document_store)
answer = graph_rag.query(
    "Who worked on both Project X and Y?",
    entities=["Project X", "Project Y"]
)
# Returns: "Alice worked on both projects"

When to use:

Multi-hop reasoning required
Entity-centric queries (people, companies, products)
Relationship-heavy domains (org charts, supply chains)

Cost: 2-3x more complex than basic RAG Benefit: Often improves multi-hop queries when relationships are explicit in a knowledge graph (magnitude varies) Problem: User queries are often vague or poorly formed. Example: User asks “Tell me about the outage”

Which outage? (Multiple incidents in database)
What aspect? (Cause, impact, resolution, timeline)

Solution: Use LLM to refine query based on initial results. Full runnable example of Iterative RAG

class IterativeRAG:
    def __init__(self, retriever):
        self.retriever = retriever
        self.max_iterations = 3
    
    def refine_query(
        self,
        original_query: str,
        retrieved_docs: List[str],
        iteration: int
    ) -> str:
        """Use LLM to refine query based on initial results."""
        refinement_prompt = f"""
        Original query: {original_query}
        
        Initial search returned these documents:
        {retrieved_docs[:2]}  # Show first 2 docs
        
        The documents don't directly answer the query. Generate a refined,
        more specific query that might work better. Consider:
        - Adding key terms from the documents
        - Clarifying ambiguous parts
        - Breaking down complex questions
        
        Refined query:
        """
        
        response = openai.chat.completions.create(
            model="gpt-4o-mini",  # Cheap model fine for query refinement
            messages=[{"role": "user", "content": refinement_prompt}],
            temperature=0.3,
            max_tokens=100
        )
        
        return response.choices[0].message.content.strip()
    
    def query(self, question: str) -> dict:
        """Iterative retrieval with query refinement."""
        current_query = question
        all_docs = []
        
        for iteration in range(self.max_iterations):
            # Retrieve with current query
            results = self.retriever.search(current_query, top_k=5)
            docs = [r['document'] for r in results]
            all_docs.extend(docs)
            
            # Check if we have good results (simple heuristic: score threshold)
            if results[0]['score'] > 0.8:
                break  # Good enough, stop iterating
            
            # Refine query for next iteration
            if iteration < self.max_iterations - 1:
                current_query = self.refine_query(question, docs, iteration)
                print(f"Iteration {iteration + 1}: Refined to '{current_query}'")
        
        # Generate final answer from all retrieved docs
        context = "\n\n".join(all_docs)
        answer = self.generate_answer(question, context)
        
        return {
            "answer": answer,
            "iterations": iteration + 1,
            "final_query": current_query
        }
    
    def generate_answer(self, question: str, context: str) -> str:
        """Generate final answer."""
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }],
            temperature=0
        )
        return response.choices[0].message.content

# Example
iterative = IterativeRAG(retriever)
result = iterative.query("Tell me about the outage")

# Output might show:
# Iteration 1: Refined to 'database outage January 2025 cause'
# Iteration 2: Refined to 'PostgreSQL outage January 15 root cause analysis'
# Final answer includes specific details from refined search

When to use:

Vague user queries common
Large document corpus (many potential matches)
Quality > speed (each iteration adds 200-500ms)

Cost: 2-4x basic RAG (multiple retrieval rounds + refinement calls) Benefit: Can improve performance on ambiguous queries by refining intent and terms (magnitude varies)

Pattern 3: Agentic RAG (LLM-Driven Retrieval)

Problem: Users don’t know the right keywords or structure. Solution: Let the LLM decide HOW to search based on the question.

from typing import Literal

class AgenticRAG:
    def __init__(self, retriever):
        self.retriever = retriever
    
    def analyze_query(self, question: str) -> dict:
        """LLM analyzes query and decides retrieval strategy."""
        analysis_prompt = f"""
        Analyze this query and determine the best retrieval strategy:
        Query: {question}
        
        Respond in JSON:
        {{
            "query_type": "factual | conceptual | multi_hop | temporal",
            "key_entities": ["entity1", "entity2"],
            "time_range": "optional ISO date range",
            "search_strategy": "keyword | semantic | hybrid",
            "metadata_filters": {{"document_type": "value"}},
            "reasoning": "brief explanation"
        }}
        """
        
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": analysis_prompt}],
            response_format={"type": "json_object"}
        )
        
        import json
        return json.loads(response.choices[0].message.content)
    
    def construct_query(self, question: str, analysis: dict) -> tuple:
        """Build optimized query based on analysis."""
        # Extract key terms for searching
        search_query = " ".join(analysis['key_entities'])
        
        # Build metadata filters
        filters = analysis.get('metadata_filters', {})
        
        # Add temporal filter if specified
        if analysis.get('time_range'):
            filters['created_at'] = {'$gte': analysis['time_range']}
        
        return search_query, filters
    
    def query(self, question: str) -> str:
        """Agentic RAG: LLM-guided retrieval."""
        # Step 1: Analyze query
        analysis = self.analyze_query(question)
        print(f"Analysis: {analysis['reasoning']}")
        
        # Step 2: Construct optimized query
        search_query, filters = self.construct_query(question, analysis)
        
        # Step 3: Retrieve with constructed query + filters
        results = self.retriever.search(
            search_query,
            metadata_filters=filters,
            top_k=5
        )
        
        # Step 4: Generate answer
        context = "\n\n".join([r['document'] for r in results])
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }]
        )
        
        return response.choices[0].message.content

# Example
agentic = AgenticRAG(retriever)

# Query 1: Factual
answer = agentic.query("What is our company's PTO policy?")
# Analysis might select: keyword search, filter to HR docs

# Query 2: Temporal
answer = agentic.query("What changed in our benefits since last year?")
# Analysis might select: semantic search, filter to docs after 2024-01-01

# Query 3: Multi-hop
answer = agentic.query("Which engineers on the data team have Python experience?")
# Analysis might select: entity search for ["data team", "Python"], multi-hop

When to use:

Complex, varied query types
Large metadata taxonomy (many filter options)
Power users who ask sophisticated questions

Cost: 30-50% more than basic RAG (analysis call adds overhead) Benefit: 30-50% improvement on complex queries, better metadata utilization

Pattern 4: Hybrid RAG (Structured + Unstructured)

Problem: Some questions need data from both databases AND documents. Example: “What’s our Q4 revenue compared to industry analysis reports?”

Revenue: Structured data (database query)
Industry analysis: Unstructured data (document retrieval)

class HybridDataRAG:
    def __init__(self, sql_db, document_retriever):
        self.sql_db = sql_db
        self.docs = document_retriever
    
    def classify_query(self, question: str) -> dict:
        """Determine if query needs SQL, docs, or both."""
        classification_prompt = f"""
        Does this query require:
        - structured data (database)?
        - unstructured data (documents)?
        - both?
        
        Query: {question}
        
        Respond in JSON:
        {{
            "needs_sql": true/false,
            "needs_documents": true/false,
            "sql_query": "SELECT ... if needed",
            "document_keywords": ["keyword1", "keyword2"]
        }}
        """
        
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": classification_prompt}],
            response_format={"type": "json_object"}
        )
        
        import json
        return json.loads(response.choices[0].message.content)
    
    def query(self, question: str) -> str:
        """Query both structured and unstructured data."""
        # Step 1: Classify query needs
        classification = self.classify_query(question)
        
        context_parts = []
        
        # Step 2a: Execute SQL if needed
        if classification['needs_sql']:
            sql_results = self.sql_db.execute(classification['sql_query'])
            context_parts.append(f"Database results:\n{sql_results}")
        
        # Step 2b: Retrieve documents if needed
        if classification['needs_documents']:
            docs = self.docs.search(
                " ".join(classification['document_keywords']),
                top_k=3
            )
            context_parts.append(f"Documents:\n{docs}")
        
        # Step 3: Generate answer from combined context
        combined_context = "\n\n".join(context_parts)
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": f"Context:\n{combined_context}\n\nQuestion: {question}"
            }]
        )
        
        return response.choices[0].message.content

When to use:

Mixed data sources (databases + documents)
Business intelligence + qualitative analysis
Compliance (regulations + internal policies)

Cost: Variable (depends on SQL complexity + retrieval) Benefit: Answers questions that pure document RAG can’t handle

Home

Context Engineering & Prompt Design

Retrieval Augmented Generation (RAG)

AI Agents

Agent Reliability & Optimization

Multi-Agent Systems & Coordination

Advanced RAG Patterns

Pattern 1: GraphRAG (Knowledge Graphs + RAG)

Pattern 2: Iterative RAG (Query Refinement)

Pattern 3: Agentic RAG (LLM-Driven Retrieval)

Pattern 4: Hybrid RAG (Structured + Unstructured)

Home

Context Engineering & Prompt Design

Retrieval Augmented Generation (RAG)

AI Agents

Agent Reliability & Optimization

Multi-Agent Systems & Coordination

​Pattern 1: GraphRAG (Knowledge Graphs + RAG)

​Pattern 2: Iterative RAG (Query Refinement)

​Pattern 3: Agentic RAG (LLM-Driven Retrieval)

​Pattern 4: Hybrid RAG (Structured + Unstructured)

Pattern 1: GraphRAG (Knowledge Graphs + RAG)

Pattern 2: Iterative RAG (Query Refinement)

Pattern 3: Agentic RAG (LLM-Driven Retrieval)

Pattern 4: Hybrid RAG (Structured + Unstructured)