How AI Learned to Lie Better Than We Do (And Why That's Becoming a Real Problem)

Photo by BoliviaInteligente on Unsplash

Last month, a lawyer in New York got absolutely blindsided. He'd used ChatGPT to research case citations for a brief, and the AI provided what looked like legitimate court decisions. The citations were detailed, specific, and presented with complete confidence. They were also entirely fabricated. The lawyer didn't catch the error before submitting, and the court was not amused.

This wasn't a glitch or a typo. This was something far stranger: an AI system that had learned to sound authoritative while being completely wrong. And it's happening more often than most people realize.

The Confidence Problem Nobody Expected

When engineers first built large language models, they assumed the biggest challenge would be getting them to generate coherent text. That turned out to be the easy part. What nobody quite anticipated was that these systems would become virtuosos of false certainty.

Here's the uncomfortable truth: modern AI doesn't actually "know" anything. It's predicting the next word based on patterns in its training data. When you ask it a question, it's essentially playing word-prediction roulette at superhuman speed. The problem is that it predicts with exactly the same confidence whether it's right or wrong.

A study from researchers at Stanford found that when given factually incorrect prompts, GPT-3 would confidently agree and elaborate on false premises. An AI told that the earth was flat didn't hesitate or express uncertainty—it simply continued the conversation as though this were established fact. It had learned the pattern of confident agreement from its training data, and it replicated it perfectly.

Dr. Stuart Russell, a leading AI safety researcher, puts it bluntly: "These systems are optimized to produce plausible-sounding text, not true text." That distinction matters enormously.

Why Your Bank Account Might Worry More Than You Should

Financial institutions are already tangling with this problem. Some have quietly deployed AI for customer service and internal research, only to discover it was producing false regulatory interpretations or making up interest rate histories.

JPMorgan Chase's COIN (Contract Intelligence) platform learned real patterns in legal documents but also learned to fill in gaps with plausible-sounding hallucinations when documents were ambiguous. The system wasn't lying intentionally—it simply didn't understand the difference between "I found this in the document" and "this is probably what should be in the document."

The stakes get higher when you zoom out. Insurance companies using AI to assess claims have caught the systems confidently denying coverage based on false policy interpretations. Medical AI systems have recommended treatments based on studies that don't exist. These aren't science fiction scenarios. They're happening right now in spreadsheets and databases you've probably interacted with.

The Training Data Time Bomb

Want to understand why AI has become such a confident bullshitter? Follow the money back to the training data.

Large language models are trained on internet text—billions of words scraped from websites, books, forums, and social media. That data includes truth, lies, speculation, rumor, propaganda, and everything in between. The AI doesn't learn to distinguish between them. It learns statistical patterns. And guess what? Confident-sounding wrong answers are everywhere on the internet.

A MIT researcher found that when you train AI on mixed-quality data, it actually learns to replicate the *confidence level* of its source material. Content written by conspiracy theorists with absolute certainty trains the model to produce similar absolute certainty on similar topics. Meanwhile, careful, hedged scientific writing trains it to be appropriately tentative. The AI becomes a mirror of the epistemic chaos it was trained on.

This is partly why your AI chatbot keeps saying confidently wrong things—it's been shaped by a training diet that rewards sounding sure.

The Band-Aids Engineers Are Building

So what are researchers actually doing about this? Several approaches are emerging, though none are perfect.

The first is uncertainty quantification. Rather than having AI output a simple answer, new systems are being trained to output a confidence score alongside their response. Google's Bard now explicitly flags when it's uncertain. It's not pretty—users often find hedging annoying—but it's honest.

The second approach is retrieval-augmented generation. Instead of just generating text from its trained patterns, the AI searches external databases for actual facts before responding. It's like giving the system access to a library to fact-check itself in real-time. Companies like Anthropic have shown this significantly reduces hallucinations.

The third is what researchers call "rejection training." Rather than rewarding confident answers, you reward the system when it says "I don't know." This feels wrong to people used to AI that always has an answer, but it's probably what we actually need.

OpenAI has been gradually incorporating these techniques, which is why newer versions of ChatGPT are somewhat more likely to admit uncertainty. But it's like asking a salesman to tell you why you shouldn't buy his product—theoretically possible, but fighting against the system's fundamental incentive structure.

What You Should Actually Do About This

Here's the practical truth: if you're using AI for anything that matters, you need a human in the loop. Not as a formality, but as an actual check.

Before that lawyer submitted his AI-generated citations, he needed to verify them. Before a doctor uses an AI diagnosis, they need to check the reasoning. Before your company deploys AI for customer service, someone needs to audit whether it's confidently wrong about your actual policies.

The uncomfortable reality is that confidence in an AI system is actually a bad sign. It should make you more suspicious, not more trusting. The best AI outputs come with caveats, citations, and clear admissions of uncertainty. If an AI system sounds too sure of itself, it probably is.

The technology is powerful. It's genuinely useful. But it's also fundamentally limited in ways we're still learning to navigate. The sooner we stop treating AI confidence as a feature and start treating it as a potential liability, the better off we'll all be.

How AI Learned to Lie Better Than We Do (And Why That's Becoming a Real Problem)

The Confidence Problem Nobody Expected

Why Your Bank Account Might Worry More Than You Should

The Training Data Time Bomb

The Band-Aids Engineers Are Building

What You Should Actually Do About This

Comments (0)

More from AI

Explore More Topics

How AI Learned to Lie Better Than We Do (And Why That's Becoming a Real Problem)

The Confidence Problem Nobody Expected

Why Your Bank Account Might Worry More Than You Should

The Training Data Time Bomb

The Band-Aids Engineers Are Building

What You Should Actually Do About This

Comments (0)

More from AI

Why Your AI Chatbot Keeps Making Confidently Wrong Answers (And How to Fix It)

Why Your AI Chatbot Keeps Giving You Weirdly Specific Advice About Penguins

Why Your AI Chatbot Keeps Giving You Terrible Advice (And What Actually Works)

Explore More Topics