Photo by Microsoft Copilot on Unsplash
The Confidence Problem Nobody Talks About
Last month, I asked ChatGPT a straightforward question: "What's the capital of Australia?" It answered "Sydney" with complete certainty. The correct answer is Canberra. What bothered me wasn't the error—it was how confidently wrong it was, delivered in the same assured tone it would use to explain quantum mechanics or constitutional law.
This isn't a bug. It's a feature of how these models work, and it's becoming increasingly common as we build bigger, more capable AI systems. A study from UC Berkeley found that larger language models actually hallucinate more frequently than their smaller predecessors, despite performing better on most benchmarks. The counterintuitive result caught researchers off guard: shouldn't bigger be better?
The answer reveals something uncomfortable about artificial intelligence. Our models are phenomenal at pattern-matching and statistical prediction, but they're operating entirely in a realm of probability. When they encounter ambiguous data or statistical noise, they don't pause and admit uncertainty—they confidently generate the most plausible-sounding response. And here's the kicker: as these models get larger and more sophisticated, their "plausible-sounding" responses become increasingly convincing, even when they're completely fabricated.
Where the Rot Begins: Our Training Data
The real problem isn't artificial intelligence. The real problem is natural intelligence that generated the training data in the first place.
Consider this: most large language models are trained on vast swaths of internet text, Wikipedia articles, books, and academic papers. But here's what we don't often acknowledge—the internet is broken. It's full of outdated information, urban legends presented as fact, conspiracy theories, contradictions, and honest mistakes that have been copied millions of times.
When researchers from Stanford University analyzed the training data used for popular language models, they discovered that approximately 3-5% of the data contained factually inaccurate information that had been amplified through repeated copying and sharing. That might sound small until you realize that a model trained on billions of tokens means millions of training examples that are simply wrong.
Worse, the internet amplifies certain types of errors systematically. Sensational claims get shared more frequently. Confident-sounding explanations (even incorrect ones) spread faster than hedged, uncertain truths. Medical misinformation, historical inaccuracies, and oversimplified explanations all become overrepresented in the training corpus because they're the ones people actually link to and quote.
The model doesn't know it's learning from a broken mirror. It just sees: "This explanation appears frequently and is phrased confidently." Lesson learned. The next time it encounters a similar question, it will generate something equally confident and equally convincing—and potentially equally false.
The Scaling Trap
Here's where it gets really interesting (and depressing). The obvious solution seems simple: "Just make the models more accurate." But the companies building these systems discovered something paradoxical during the scaling era—bigger models with more parameters tend to be more fluent at producing confident nonsense.
Why? Because scale rewards a different kind of learning. A smaller model might memorize that "the capital of Australia is Canberra" as a discrete fact. But as you scale up to billions of parameters, the model learns something subtler: "When someone asks about Australian capitals, responses that include specific place names, proper grammar, and confident assertion are typically well-received in my training data."
This is what researchers call "learned spurious correlations." The model has learned to identify and replicate the surface characteristics of authoritative speech without actually understanding the underlying truth. It's become a master of sounding right.
This phenomenon explains something you might have noticed: newer, more capable models sometimes seem worse at basic facts than older versions. That's because they've gotten much better at the nuanced dance of natural language, which means they've also gotten better at the nuanced dance of convincing lies.
The Feedback Loop Nobody Wants to Acknowledge
There's a darker implication here that keeps AI researchers up at night, though few talk about it publicly. As more people use these models and incorporate their outputs into the internet, we're potentially creating a feedback loop of degradation.
Imagine this scenario: An AI model generates a plausible-sounding but incorrect explanation of a niche scientific concept. A blogger reads it, finds it compelling, and writes about it. A journalist picks up the blog post. Academic Twitter amplifies it. Now this false explanation exists in multiple places across the internet, all seeming to corroborate each other.
When the next generation of AI models trains on this data, they see reinforcement. Multiple independent sources saying the same thing. They don't realize all those sources were ultimately bootstrapped from a single AI hallucination. The false claim becomes even more embedded in the statistical patterns they're learning.
We've started seeing early signs of this. Research from OpenAI and others shows that information recycled from AI outputs and then retrained on new models degrades in quality measurably with each cycle. It's like xeroxing a xerox of a xerox—each generation loses fidelity.
What Actually Needs to Change
The solutions being proposed by major labs usually focus on fine-tuning, reinforcement learning from human feedback, and fact-checking layers. These help, but they're band-aids on a structural wound.
The harder, less popular truth is that we need to fundamentally rethink what we train these models on. Not just filtering for accuracy (though that helps), but actively restructuring training data to emphasize uncertainty and nuance. Models should learn that experts often disagree, that complex topics have multiple valid perspectives, and that "I don't know" is a valid, valuable response.
Some researchers are experimenting with training models explicitly to express uncertainty. Others are building systems that can cite their sources rather than generating answers from pure learned patterns. But these approaches require accepting something uncomfortable: smarter, more honest AI systems might sometimes be less convenient. They might hedge. They might admit limitations.
For a technology industry built on confidence and disruption, that's a difficult pill to swallow.
The irony is that this problem isn't actually new to AI. It's just a digital expression of a very human problem. We've always been vulnerable to convincing falsehoods, especially ones that confirm what we already believe. The internet amplified this tendency. AI is amplifying it further. The question isn't whether AI is hallucinating more—it's whether we're prepared to admit that the problem was never really about artificial intelligence at all. It's about how easily we mistake eloquence for truth.
If you want to understand more about how these systems fail, consider reading about how AI learned to sound confident while being completely wrong—it explores similar themes with specific examples from real-world deployments.

Comments (0)
No comments yet. Be the first to share your thoughts!
Sign in to join the conversation.