Last week, I asked ChatGPT who won the 2023 Nobel Prize in Physics. It confidently told me it was Roger Penrose, along with a detailed explanation of his contributions. There's one problem: the prize went to Pierre Agostini, Ferenc Krausz, and Anne L'Huillier. ChatGPT didn't hedge its bets or say "I'm not sure." It lied with absolute conviction.
This phenomenon has a name in AI circles: hallucination. But that word makes it sound glitchy and accidental, like your phone occasionally autocorrects "hello" to "jello." The reality is messier and more revealing about how we've built these systems.
The Confidence Problem We Designed Into AI
Here's what happens under the hood. Large language models like GPT-4 or Claude are trained using something called next-token prediction. Basically, you feed them billions of words from the internet, and they learn to predict what word comes next. Then the next word. Then the next one. They're playing an elaborate guessing game, and they got really, really good at it.
The problem? This training method doesn't actually teach the model the difference between true and false. It teaches the model what patterns show up most frequently in its training data. If your training data contains more confident-sounding sentences than uncertain ones—which it does, because humans tend to write with conviction—the model learns to be confident too.
Think of it this way: if I trained a model on Oscar acceptance speeches, it would learn to sound humble and emotional. If I trained it on academic papers, it would learn to sound careful and qualified. I trained these models on... the entire internet. Which sounds authoritative and correct way more often than it actually is correct.
When researchers at Stanford tested GPT-3's knowledge across different topics, they found something fascinating. The model was more likely to hallucinate answers on obscure topics than common ones. Why? Because for obscure facts, there's less training data, so the model falls back on generating text that just *sounds* right. It's pattern matching without understanding.
Why We're Not Fixing This (Even Though We Could)
You might think that solving this would be priority number one. We could, theoretically, penalize models for making up information. We could train them to say "I don't know" more often. Some researchers are trying exactly this.
But here's where financial incentives and consumer expectations create a nasty trap. Users don't want an AI that says "I don't know" to half their questions. They want help. They want answers. A chatbot that hedges everything feels useless, even if it would be more honest.
Companies face a choice: make a cautious AI that few people want to use, or make a confident AI that millions will pay for, knowing it will sometimes fabricate. The market has spoken loudly about which version it prefers.
There's also a technical reason this is harder than it sounds. Teaching a model when to be uncertain requires labeled training data—human experts saying "this model should be confident here" and "not confident there." That's expensive. Really expensive. And for many niche topics, there's genuine disagreement about what's true, so even experts can't provide clean labels.
The Real-World Consequences of Sophisticated Bullshit
This isn't just an annoying quirk for trivia buffs. In 2023, a lawyer in New York cited fake cases generated by ChatGPT in an actual court filing. The AI had invented judicial decisions that never existed. The lawyer later said he didn't realize AI could hallucinate that badly.
Medical professionals have started using ChatGPT for diagnosis research, not realizing it confidently presents invented symptoms and treatments as real. A radiologist in Toronto tested GPT-4 on medical scenarios and found it gave confidently incorrect diagnoses roughly 20% of the time—often with perfectly reasonable-sounding explanations.
The insidious part? These hallucinations don't *feel* like hallucinations. They sound natural. They're wrapped in the linguistic patterns of legitimate knowledge. An average user has no way to distinguish between a real fact and a plausible-sounding invention.
This is different from a Google search that returns no results. It's different from a human admitting they don't know something. This is a system presenting fiction with the full weight of apparent expertise behind it.
What Actually Needs to Change
Some researchers are making progress. OpenAI has started using reinforcement learning from human feedback (RLHF) to train models to be more truthful, though with limited success. Others are exploring retrieval-augmented generation—essentially giving AI the ability to look up information rather than relying purely on what's in its training data.
But the deeper fix requires rethinking incentives. We need AI systems that are rewarded for admitting uncertainty. We need users who understand that a tool saying "I don't know" might be more valuable than one that confidently makes things up. We need regulation that distinguishes between AI used for entertainment and AI used for high-stakes decisions.
We also need more AI literacy. Not everyone should understand transformer architectures, but everyone using these tools should understand one simple fact: they're language prediction machines, not knowledge machines. They don't know things. They predict what words typically follow other words. Sometimes that prediction happens to be true. Sometimes it's convincing fiction.
Until we rebuild these systems with different incentives—or until users demand something different—expect to see more stories about AIs confidently getting things wrong. It's not a bug in the system. It's a feature of how we designed them.
Comments (0)
No comments yet. Be the first to share your thoughts!
Sign in to join the conversation.