Skip to main content

Hello World

After configuring your OpenAI API key, you can run this example by executing npm run llm:sample1 in the terminal. Give a try !

How LLMs Process Your Input

Before we write our first prompt, let’s understand what’s actually happening under the hood. The Context Window: Your Working Memory Think of an LLM’s context window like RAM on your computer. Everything you send - your instructions, conversation history, documents - gets loaded into this window. The model can only “see” what fits inside. Current Context Windows (as of January 2025; verify latest on vendor pages below):
  • GPT-4: 128K tokens (~96K words)
  • Claude Sonnet 4.5: 200K tokens (~150K words)
  • Gemini 1.5 Pro: 2M tokens (~1.5M words)
See vendor references: OpenAI models, Anthropic pricing/models, Google Gemini models. Why This Matters: A customer support agent handling a complex case might need:
  • 2K tokens: System instructions
  • 5K tokens: Company knowledge base excerpts
  • 10K tokens: Conversation history
  • 3K tokens: Customer account details
That’s 20K tokens before the LLM generates a single word. Multiply by thousands of requests, and you see why context management matters. Token Economics: inputcost=150Ktokens($10/1Mtokens)=$1.50outputcost=3Ktokens($30/1Mtokens)=$0.09totalperrequest=$1.59input\,cost = 150K\,tokens * (\$10\,/\,1M\,tokens) = \$1.50 \newline output\,cost = 3K\,tokens * (\$30\,/\,1M\,tokens) = \$0.09 \newline total\,per\,request = \$1.59

LLM Limitations You Must Know

1. Hallucinations: Making Stuff Up LLMs are trained to predict the next plausible token. They’re not fact-checking databases. Famous Failure: Air Canada’s chatbot hallucinated a bereavement discount policy that didn’t exist. The airline had to honor it in court. Cost: Unknown, but significant legal precedent. (BBC, 2024) Why It Happens:
  • Missing information → fills gaps with plausible-sounding text
  • Conflicting instructions → makes judgment calls
  • Outdated training data → invents current information
What Works in Production:
  • Constrain to provided context: “Only use information from these documents”
  • Validate outputs: Check facts against source data
  • Ground the answer in knowledge (throughout the tutorial).
  • Add human review: For high-stakes decisions
2. Non-Determinism: Different Answers Every Time Run the same prompt twice, get different answers. That’s by design (temperature > 0). Full runnable non-deterministic outputs notebook
# Same prompt, different responses
prompt = "Summarize this customer complaint in 10 words"

response_1 = "Customer angry about delayed shipment, demanding refund immediately"
response_2 = "Shipment late, customer wants money back, very upset"
response_3 = "Delayed delivery complaint with refund request, customer frustrated"
Production Strategy: Control the creativity of the model with the Temperature parameter.
  • Temperature=0 for minimal creativity tasks (classification, extraction)
  • Temperature=0.3-0.7 for creative tasks (writing, brainstorming)
  • Run multiple times and vote (self-consistency, covered in 1.5)
GPT-5 models do not support the temperature parameter, and using it will raise an error. This breaks backward compatibility with earlier OpenAI models.Instead, GPT-5 introduces a new way to control output variability: reasoning depth, via:To achieve similar results with reasoning effort set higher, or with another GPT-5 family model, try these alternative parameters:
Reasoning depth: reasoning: { effort: "none" | "low" | "medium" | "high" }
3. Recency Bias: Recent Context Matters More LLMs pay more attention to text at the beginning and end of the prompt. Middle content gets “lost.” Heuristic: Prompt structure and placement can affect results. Many teams place the specific query toward the end; validate with your use case. **Best Practice for Gemini: **See Gemini prompting strategies.
Critical instructions at START
[Large document content in MIDDLE]
Specific query at END

Mental Model: LLMs as Completion Engines

Wrong Mental Model: “The AI understands my intent”
Right Mental Model: “The AI completes patterns it’s seen in training”
Example:
prompt = "The capital of France is"
result = "Paris"
# Not because it "knows" geography, but because it's seen this pattern millions of times

prompt = "The capital of Atlantis is"
result = "unknown" or makes something up
# It hasn't seen this pattern → hallucination risk
Practical Implication: When you want structured output, give the LLM a pattern to complete:
# ❌ Vague: "Extract the customer's name and email"
# LLM might respond in prose: "The customer's name is John and his email is..."

# ✅ Pattern to complete:
"""
Extract customer information:

Name: [the customer's name]
Email: [the customer's email address]
"""
# LLM completes the pattern: "Name: John Smith\nEmail: [email protected]"