Skip to main content
The retrieval strategy you choose determines what your LLM sees. This page compares lexical, semantic, and hybrid search — with trade-offs and when to use each.

Lexical Search (Keyword)

How it works: Statistical matching on exact terms and their frequencies (BM25). Strengths:
  • Low-latency on well-indexed corpora (actual latency depends on engine, index, and hardware)
  • Excellent for exact matches and rare terms
  • No model training required
  • Interpretable results
Weaknesses:
  • Misses synonyms (“car” ≠ “automobile”)
  • Struggles with conceptual queries
  • Language-specific (requires stemming/lemmatization)
Best for:
  • Legal document search (exact statute numbers)
  • Code search (function names, error codes)
  • Product SKU lookup
  • Any domain with precise terminology

Semantic Search (Vector)

How it works: Converts text to vectors in semantic space; similar meaning = nearby vectors (Embeddings). Strengths:
  • Handles synonyms and paraphrasing
  • Works across languages (multilingual models)
  • Captures conceptual similarity
  • No query engineering needed
Weaknesses:
  • Slower than BM25 (100-500ms for large collections)
  • Misses exact matches if semantically “boring”
  • Black box (hard to debug why something matched)
  • Requires GPU for large-scale indexing
Best for:
  • Customer support (intent-based)
  • Research papers (conceptual queries)
  • Multilingual search
  • FAQ matching

Hybrid Search (Lexical + Semantic)

The Production Standard: Combine lexical (keyword) and semantic (vector) search. Why hybrid wins:
  • Catches exact matches lexical search excels at
  • Catches semantic matches vector search excels at
  • Often improves retrieval quality across diverse corpora (magnitude varies by dataset and metric)
  • Commonly used in production systems
Implementation approaches:
  1. Weighted fusion: Combine scores with learned weights
  2. Rank fusion: Merge ranked lists (Reciprocal Rank Fusion - RRF)
  3. Two-stage: Lexical first pass → semantic reranking

When to Use Each Strategy

Query TypeBest StrategyExample
Exact terminologyLexical”ICD-10 code M54.5” (medical)
Product codes/IDsLexical”SKU-2847-B”
Conceptual questionSemantic”How do I improve sleep?”
Paraphrased intentSemantic”Can’t sign in” → password reset
Mixed (most production)Hybrid”Latest Python security updates”

In Production

Cost Impact:
  • Lexical: ~$0.0001 per query (compute only, no API calls)
  • Semantic: ~$0.001-0.01 per query (embedding API + vector DB)
  • Hybrid: ~$0.002-0.015 per query (both methods)
Performance:
  • Lexical: typically lower latency at moderate scales
  • Semantic: generally higher latency than lexical; depends on index type and hardware
  • Hybrid: adds overhead; parallel execution helps
Accuracy (typical):
  • Lexical alone: 70-75% relevant results
  • Semantic alone: 72-78% relevant results
  • Hybrid: 85-92% relevant results
Recommendation: Start with hybrid unless you have strict latency requirements (<50ms) or very clear use case for lexical-only.