Photo by Microsoft Copilot on Unsplash

Last month, a lawyer in New York got caught submitting fake case citations to a federal court. The citations looked perfect. The formatting was immaculate. The case names sounded legitimate. There was just one problem: none of them existed. The lawyer had used ChatGPT to research his brief, and the AI had confidently invented judicial precedents that had never seen the inside of a courtroom.

The lawyer wasn't careless. He wasn't incompetent. He made the same mistake millions of people make every day: he trusted that an AI system saying something with absolute certainty meant that something was true.

But here's what most people don't realize: the AI wasn't making a mistake at all.

The Confidence Paradox

Large language models like GPT-4, Claude, and Gemini operate on a principle that's almost comically simple: they predict the next word in a sequence based on probability patterns learned from training data. That's it. They're not reasoning. They're not checking facts. They're pattern-matching at superhuman scale.

When an AI generates a response, it's essentially saying, "Based on statistical patterns in text, this is the most likely next word." Repeat that process hundreds of times per sentence, and you get a coherent response. But coherence and accuracy are completely different things.

The strange part? Making mistakes and sounding confident are actually the same thing.

Think about it. If you're a language model trained on billions of tokens, you've seen how humans write about topics they're uncertain about versus topics they're confident in. You've internalized the linguistic patterns of confidence. A statement without hedging language, without qualifiers, without uncertainty markers—that's how confident people sound. So when the model's probability distribution happens to favor a made-up citation or a false statistic, it generates that information with the exact same grammatical confidence as it would use for something true.

The model isn't lying. It's just following learned patterns about how language works.

Why This Problem Gets Worse, Not Better

You might think that larger, more powerful AI models would hallucinate less. Intuition suggests that bigger training datasets and more parameters should mean better accuracy. Sometimes that's true. But for many types of hallucinations, the opposite happens.

A 2023 study from Microsoft researchers found that GPT-4, despite being more capable than GPT-3.5, actually generates more hallucinations in certain domains. Why? Because more powerful models are better at sounding plausible. They're better at filling in gaps with coherent-sounding fiction. A weaker model might say "I'm not sure about that." A stronger model will confidently generate an answer that fits grammatically and contextually, even if it's completely fabricated.

It's like the difference between a bad liar and a good liar. The bad liar stammers and contradicts himself. The good liar never wavers. The good liar is far more dangerous.

And here's the thing: we keep feeding these models more data, training them longer, making them more capable at generating fluent, coherent, confident-sounding text. We're not fixing the hallucination problem. We're making the hallucinations more convincing.

The Training Data Problem Nobody Wants to Talk About

If you look at how these models are trained, you start to see why hallucinations aren't just a bug—they're almost inevitable given the current approach.

Large language models are trained on internet text. All of it. Wikipedia. Academic papers. Blog posts. Reddit threads. News articles. Social media. The entire chaotic corpus of human-generated text online, which includes false information, conspiracy theories, outdated medical advice, contradictory claims, and outright lies.

The model learns probability distributions across all of this. It learns that certain words tend to follow other words. It learns statistical patterns. But it never learns what's actually true. It learns what patterns of truthfulness look like, which is different.

When you ask it a question about something specific and detailed—something that might appear rarely in training data or might combine concepts in novel ways—it faces a gap. The probability model hasn't seen enough examples to generate tokens from actual knowledge. So it does what the training objective rewards: it completes the pattern in the most statistically likely way.

For questions about real facts, this sometimes aligns with truth. For novel combinations or specific details, it's a crapshoot. The model doesn't know the difference. It can't. It has no access to a database of facts. It has no way to check. It just has probabilities.

The Real Question We Should Be Asking

We spend a lot of energy trying to patch this problem with techniques like retrieval-augmented generation (adding fact-checking databases) or constitutional AI (training models to refuse uncertain questions). Those are helpful band-aids. But they don't address the fundamental issue: these systems are designed to generate text, not to know things.

The uncomfortable truth is that we might be asking these AI systems to do something they're fundamentally incapable of doing reliably. We're asking them to distinguish between true and false, but they've never learned what truth actually is. They've only learned what truth-like patterns look like.

This is exactly what our investigation into how AI learned to fake expertise uncovered: the more capable these systems become at sounding authoritative, the more dangerous they are when they're wrong.

The lawyer who submitted fake citations will probably face sanctions. His client might lose the case. But he's not an outlier. He's just the first one to get caught by a judge. Thousands of people are trusting AI-generated information every single day in contexts where they shouldn't. Medical decisions. Legal strategy. Financial advice. Scientific research.

Until we solve the fundamental problem—building AI systems that actually know things rather than systems that generate thing-like patterns—we're all just one confident hallucination away from disaster.