From 4b4773ca36e4f28b1985377240caf1b537ebd72f Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 5 Mar 2026 03:40:43 +0000
Subject: [PATCH] Replace first Challenge 5 question with off-topic elephant
 question

Tests whether RAG system stays grounded when the query has
nothing to do with the corpus topic.

https://claude.ai/code/session_01Ev2pi7ijzrmjY8GW55PLnr
---
 episodes/07-Retrieval-augmented-generation.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/episodes/07-Retrieval-augmented-generation.md b/episodes/07-Retrieval-augmented-generation.md
index f18f5df2..29d8ee02 100644
--- a/episodes/07-Retrieval-augmented-generation.md
+++ b/episodes/07-Retrieval-augmented-generation.md
@@ -370,8 +370,8 @@ Lower `top_k` gives Gemini a tighter, more focused context — good when the ans
 The quality of a RAG system depends heavily on the questions you ask. Try these queries — each tests a different aspect of retrieval and generation:
 
 ```python
-# Broad factual question — answer should be well-supported by multiple papers
-print(ask("How much energy does it cost to train a large language model?"))
+# Off-topic question — not covered by the corpus at all
+print(ask("How much does an elephant weight?"))
 
 print("\n" + "="*60 + "\n")
 
@@ -391,7 +391,7 @@ For each question, consider:
 
 :::::::::::::::::::::::: solution
 
-The energy-cost question should produce a strong answer because the corpus contains multiple papers with concrete training-energy figures. The cloud-vs-HPC question requires the model to compare across sources — look for whether it hedges appropriately when papers disagree. The "best cloud provider" question is deliberately tricky: the corpus is about environmental costs of AI, not cloud provider rankings, so a well-behaved RAG system should indicate that the context doesn't support a definitive answer rather than generating marketing-style claims.
+The elephant-weight question is deliberately off-topic — the corpus is about environmental costs of AI, not zoology, so a well-behaved RAG system should indicate that the context doesn't contain relevant information rather than answering from general knowledge. The cloud-vs-HPC question requires the model to compare across sources — look for whether it hedges appropriately when papers disagree. The "best cloud provider" question is deliberately tricky: the corpus is about environmental costs of AI, not cloud provider rankings, so a well-behaved RAG system should indicate that the context doesn't support a definitive answer rather than generating marketing-style claims.
 
 :::::::::::::::::::::::::::::::::