Building a Hermeneutic of Suspicion: A Critical Evaluation Exercise - Answer Key

Modern chatbots like ChatGPT, Claude, and Gemini are complex hybrids that combine search, generation, and retrieval. Understanding what each component does well and poorly helps us evaluate outputs critically.

Important: These are tendencies, not absolutes. Any given prompt might produce a good answer due to the hybrid nature of modern systems. The goal is to understand underlying limitations, not to memorize rules.

Part 1: Prompts That Challenge Search Engines

Table 2

Part 2: Prompts That Challenge Pure LLMs

Table 3

Part 3: Prompts That Challenge RAG Systems

Table 4

Part 4: Prompts That Challenge Modern Hybrid Systems

Table 5

Developing Critical Evaluation Habits

When evaluating AI outputs, ask yourself:

What kind of information or capability would this task require?
Where might that information come from? (Training data, retrieval, nowhere?)
What could go wrong? (Hallucination, misinterpretation, outdated info?)
Does this require judgment that goes beyond pattern matching or retrieval?
Would I be able to verify this answer? How?

The Hybrid Complication

Modern chatbots are unpredictable because they're hybrids that may route queries differently. This unpredictability is why we need critical evaluation habits rather than simple rules about what works and what doesn't.