Photo by Solen Feyissa on Unsplash

Last month, a researcher at UC Berkeley fed GPT-4 a straightforward math problem. Nothing exotic—just a question about probability that a competent high schooler could solve. The model's response? Completely fabricated. Not just wrong, but confidently wrong, complete with citations to nonexistent papers.

This wasn't a glitch. It was a feature of how these systems actually work.

The strange phenomenon where AI models generate plausible-sounding but entirely false information—what researchers call "hallucination"—has gotten worse as these systems have gotten smarter. And this month, new research reveals something counterintuitive: the problem intensifies precisely when we ask models to do their best thinking, to really reason through a problem step by step.

The Confidence Paradox That's Breaking Research Labs

When you use chain-of-thought prompting—asking an AI to "think through this step by step"—something weird happens. The model becomes more confident in its answers. Studies show accuracy improves, sometimes dramatically. But so does hallucination.

It's like watching someone explain their reasoning process become increasingly convinced of something that simply isn't true. The more steps they lay out, the more elaborate the justification, the more certain they sound. And the harder it is to convince them they're wrong.

A team at Stanford tested this with historical questions. When prompted to answer directly, GPT-4 hallucinated dates and names about 12% of the time. When asked to reason through answers step by step, hallucination actually increased slightly—to 14%—but here's the kicker: people believed those wrong answers 28% more often because the chain-of-thought made them seem more legitimate.

"We're essentially giving hallucinations a megaphone," one researcher told me, asking not to be named because her lab is still finishing a paper on the topic.

Why Your Brain Expects AI to Think Like You Do

The root issue is that we're fundamentally misunderstanding what's happening inside these models. We use the word "reasoning," and our brains automatically think: logic, causality, actually understanding the problem. That's completely wrong.

These models are doing something more like "pattern completion on steroids." When you ask GPT-4 to think step by step, you're not activating some deeper reasoning engine. You're asking it to generate more text in a particular format—and that format happens to be something that pattern-matching can excel at for many problems.

But here's the problem: the model has no internal mechanism to distinguish between "I'm pattern-matching a real solution" and "I'm pattern-matching something that just looks like how a solution should be formatted." It's reading the probability distribution of what text comes next. If you've given it instructions to explain reasoning, it will generate text that looks like reasoning, whether it's describing something real or something completely invented.

Think about predictive text on your phone. If you text "I'm going to the" your phone predicts "store." It doesn't actually know your plans. It's just learned that these words frequently follow each other. Now imagine that system extended to generate entire paragraphs, with no way to verify whether the specific claims it's making are actually true.

The Scale Problem Nobody's Talking About

Here's where it gets genuinely concerning for the AI field. We've been assuming that scaling up—making models bigger, training them on more data—would reduce hallucination. Bigger models are generally smarter, after all.

Early data suggests this is backwards. Why AI Developers Are Secretly Terrified of the Scaling Problem Nobody Wants to Admit explores this growing panic in more depth, but the basic issue is that scaling amplifies hallucination at least as much as it improves actual reasoning capabilities.

A leaked internal evaluation at OpenAI compared GPT-3.5 with GPT-4 on a benchmark of factual questions. GPT-4 scored 89% versus GPT-3.5's 84%. Sounds great—until you look at the detailed results. GPT-4 answered 8% of questions with completely fabricated information that sounded confident and authoritative. GPT-3.5? It would often say "I don't know" or hedge its bets. Users preferred GPT-4's more confident wrong answers to GPT-3.5's honest uncertainty.

This is genuinely alarming because it inverts the typical relationship between capability and reliability. Usually, more capable systems are more trustworthy. AI is breaking that rule.

What Researchers Are Actually Trying to Fix

The field hasn't been sitting still. Multiple approaches are showing promise, though none are silver bullets.

Retrieval-augmented generation—having the model actually look up factual information before generating answers—cuts hallucination dramatically. When Claude was modified to search a database of verified facts, hallucination on factual questions dropped from 22% to 3%. The trade-off is speed and cost.

Some labs are experimenting with "uncertainty quantification." Rather than asking the model for a yes/no answer, you extract the probability distribution of its predictions. If the model is unsure, the distribution is spread out. If it's making something up, the distribution is often paradoxically narrow—it's committing confidently to its fabrication. Learning to recognize that pattern helps catch hallucinations before humans see them.

The most promising approach involves training models to be comfortable saying "I don't know." This sounds simple, but it's genuinely hard. Models trained on internet text have learned that confident answers get more engagement. We're fighting against their learned instincts.

The Uncomfortable Reality We Need to Accept

If there's a single takeaway, it's this: we can't think of current AI systems as intelligent reasoners that occasionally make mistakes. They're sophisticated pattern-matching systems that generate text based on statistical patterns, and they will confidently generate false information whenever that text fits the pattern they've learned.

They're not getting less hallucinating with scale. They're getting better at producing hallucinations that convince people they're true.

The breakthrough everyone's waiting for—the one where we somehow get AI to actually "understand" facts rather than just pattern-match them—remains fundamentally unsolved. Until it is, every system claiming to be a reliable source of factual information is essentially playing Russian roulette with your trust.