LLM Fundamentals

Hello World

After configuring your OpenAI API key, you can run this example by executing npm run llm:sample1 in the terminal. Give a try !

How LLMs Process Your Input

Before we write our first prompt, let’s understand what’s actually happening under the hood. The Context Window: Your Working Memory Think of an LLM’s context window like RAM on your computer. Everything you send - your instructions, conversation history, documents - gets loaded into this window. The model can only “see” what fits inside. Current Context Windows (as of January 2025; verify latest on vendor pages below):

GPT-4: 128K tokens (~96K words)
Claude Sonnet 4.5: 200K tokens (~150K words)
Gemini 1.5 Pro: 2M tokens (~1.5M words)

See vendor references: OpenAI models, Anthropic pricing/models, Google Gemini models. Why This Matters: A customer support agent handling a complex case might need:

2K tokens: System instructions
5K tokens: Company knowledge base excerpts
10K tokens: Conversation history
3K tokens: Customer account details

That’s 20K tokens before the LLM generates a single word. Multiply by thousands of requests, and you see why context management matters. Token Economics:

input\,cost = 150K\,tokens * (\$10\,/\,1M\,tokens) = \$1.50 \newline output\,cost = 3K\,tokens * (\$30\,/\,1M\,tokens) = \$0.09 \newline total\,per\,request = \$1.59

LLM Limitations You Must Know

1. Hallucinations: Making Stuff Up LLMs are trained to predict the next plausible token. They’re not fact-checking databases. Famous Failure: Air Canada’s chatbot hallucinated a bereavement discount policy that didn’t exist. The airline had to honor it in court. Cost: Unknown, but significant legal precedent. (BBC, 2024) Why It Happens:

Missing information → fills gaps with plausible-sounding text
Conflicting instructions → makes judgment calls
Outdated training data → invents current information

What Works in Production:

Constrain to provided context: “Only use information from these documents”
Validate outputs: Check facts against source data
Ground the answer in knowledge (throughout the tutorial).
Add human review: For high-stakes decisions

2. Non-Determinism: Different Answers Every Time Run the same prompt twice, get different answers. That’s by design (temperature > 0). Full runnable non-deterministic outputs notebook

# Same prompt, different responses
prompt = "Summarize this customer complaint in 10 words"

response_1 = "Customer angry about delayed shipment, demanding refund immediately"
response_2 = "Shipment late, customer wants money back, very upset"
response_3 = "Delayed delivery complaint with refund request, customer frustrated"

Production Strategy: Control the creativity of the model with the Temperature parameter.

Temperature=0 for minimal creativity tasks (classification, extraction)
Temperature=0.3-0.7 for creative tasks (writing, brainstorming)
Run multiple times and vote (self-consistency, covered in 1.5)

GPT-5 models do not support the temperature parameter, and using it will raise an error. This breaks backward compatibility with earlier OpenAI models.Instead, GPT-5 introduces a new way to control output variability: reasoning depth, via:To achieve similar results with reasoning effort set higher, or with another GPT-5 family model, try these alternative parameters:

Reasoning depth: reasoning: { effort: "none" | "low" | "medium" | "high" }

3. Recency Bias: Recent Context Matters More LLMs pay more attention to text at the beginning and end of the prompt. Middle content gets “lost.” Heuristic: Prompt structure and placement can affect results. Many teams place the specific query toward the end; validate with your use case. **Best Practice for Gemini: **See Gemini prompting strategies.

Critical instructions at START
[Large document content in MIDDLE]
Specific query at END

Mental Model: LLMs as Completion Engines

Wrong Mental Model: “The AI understands my intent”
Right Mental Model: “The AI completes patterns it’s seen in training” Example:

prompt = "The capital of France is"
result = "Paris"
# Not because it "knows" geography, but because it's seen this pattern millions of times

prompt = "The capital of Atlantis is"
result = "unknown" or makes something up
# It hasn't seen this pattern → hallucination risk

Practical Implication: When you want structured output, give the LLM a pattern to complete:

# ❌ Vague: "Extract the customer's name and email"
# LLM might respond in prose: "The customer's name is John and his email is..."

# ✅ Pattern to complete:
"""
Extract customer information:

Name: [the customer's name]
Email: [the customer's email address]
"""
# LLM completes the pattern: "Name: John Smith\nEmail: [email protected]"

Home

Context Engineering & Prompt Design

Retrieval Augmented Generation (RAG)

AI Agents

Agent Reliability & Optimization

Multi-Agent Systems & Coordination

Hello World

How LLMs Process Your Input

LLM Limitations You Must Know

Mental Model: LLMs as Completion Engines

Home

Context Engineering & Prompt Design

Retrieval Augmented Generation (RAG)

AI Agents

Agent Reliability & Optimization

Multi-Agent Systems & Coordination

​Hello World

​How LLMs Process Your Input

​LLM Limitations You Must Know

​Mental Model: LLMs as Completion Engines

Hello World

How LLMs Process Your Input

LLM Limitations You Must Know

Mental Model: LLMs as Completion Engines