Why AI Can't Stop Lying to Your Face (Even When It Wants To)

Photo by julien Tromeur on Unsplash

Last Tuesday, I asked ChatGPT who won the 2019 Pulitzer Prize for Fiction. It told me it was "Margaret Atwood for The Testaments." Confident. Specific. Wrong. Atwood won it in 2020, not 2019. What was 2019's winner? Anthony Doerr for "All the Light We Cannot See"—a book published in 2014.

This wasn't a glitch. This was the model doing exactly what it was designed to do: predict the next word in a sequence. The fact that the sequence happened to be complete nonsense is almost beside the point.

The Pattern Completion Problem That Nobody Warned Us About

Here's the uncomfortable truth about large language models: they're not thinking. They're not consulting databases. They're not even trying to be accurate. They're doing something far more mechanically simple and far more dangerous: predicting which token (a chunk of text, usually a few characters) should probably come next based on patterns in their training data.

When you feed GPT-4 a question about the 2019 Pulitzer Prize, it doesn't access memories. It doesn't reason through possibilities. Instead, it's essentially asking itself: "Given everything I've learned, what word would most likely follow these tokens?" And because Pulitzer Prize announcements, book reviews, and literary discussions share certain linguistic patterns, the model can generate something that sounds plausible—even authoritative—while being completely fabricated.

The technical term for this is "hallucination." But that word undersells the problem. Hallucinations suggest spontaneous, random errors. What's actually happening is more systematic. Why Your AI Model Keeps Hallucinating About Things That Never Happened explores this phenomenon in depth, but the short version is this: the model is acting according to its training, not malfunctioning.

Confidence Without Competence: The Dangerous Asymmetry

The scariest part? The model sounds utterly convinced. It doesn't hedge. It doesn't say "I'm not sure, but maybe Margaret Atwood?" It delivers its fabrications with the same confident tone it uses for verifiable facts.

This is because language models have learned what confident language looks like. They've seen millions of examples of humans writing with certainty, whether justified or not. The model learned that certain linguistic patterns correlate with statements people treat as factual. So it produces those patterns.

Consider a 2023 study from researchers at Berkeley: when they tested various LLMs on factual questions, the models' confidence scores (calculated from the probability distributions behind their responses) had almost zero correlation with accuracy. A model could be 95% confident about something completely false and only 60% confident about something true.

This creates a peculiar trap. We humans interpret confidence as a signal of accuracy. We evolved that way—generally, if someone says something with conviction and didn't immediately drop dead, they probably knew what they were talking about. But language models haven't evolved anything. They're just pattern-matching machines that learned what confident-sounding language looks like, independently from whether that language describes reality.

The Training Data Problem That Haunts Every Model

Another layer of the problem: the training data itself is messy. Models learn from vast swaths of the internet, books, and other text. This includes Wikipedia articles (written by volunteers of wildly varying expertise), Reddit threads (written by anonymous people with varying confidence in their own knowledge), academic papers (where errors exist too), and legacy texts riddled with outdated or false information.

A model trained on this data learns the statistical patterns of how humans write, not the underlying truth. If Wikipedia says one thing appears frequently alongside "2019 Pulitzer Prize" and another thing appears less frequently, the model learns those frequencies. It doesn't learn that one is true and one is false. It learns: these tokens cluster together in these ways.

Even scarier: the training data has a cutoff date. GPT-4 was trained on data up to April 2024. Beyond that? The model will confidently fabricate. It's not trying to. It's just continuing the patterns it learned, and sometimes those patterns lead off a cliff directly into pure fiction.

What This Means For Everyone Using AI Right Now

If you're using AI for brainstorming, creative writing, or exploring ideas? This is fine. The hallucinations are actually kind of a feature—the model's generative flexibility is useful when you don't need facts.

But if you're using AI to research a medical condition, verify information for a business decision, or look up something you'll repeat to other people? You need a completely different approach. You need external verification. You need to treat the AI output as a draft, a starting point, not a reliable source.

The honest assessment: we don't yet have a good solution that's baked into the architecture. Some companies are trying "retrieval-augmented generation," where the model looks up information in a knowledge base rather than generating from learned patterns. But this only works if the knowledge base exists and is accurate. Other approaches involve having humans fact-check everything. Neither is a complete solution.

The Path Forward Isn't Magic

Some researchers are exploring methods to make models more honest. Reinforcement learning from human feedback helps. Constitutional AI (where models are trained against a set of principles) helps a little. But there's no silver bullet, no architectural change that'll make pattern-completion systems suddenly reliable at producing facts.

The uncomfortable reality is that we're using tools that fundamentally work through statistical pattern matching and expecting them to output truth. Those are different problems with different requirements.

For now, the best approach is honest: understand what these models are and aren't good for. Use them as thinking partners, not oracles. Verify critical information. And yes, when someone asks you something important, maybe still check with a human or a real database before repeating what your AI assistant told you.

Because your AI assistant? It's a magnificent liar. It just doesn't know it.

Why AI Can't Stop Lying to Your Face (Even When It Wants To)

The Pattern Completion Problem That Nobody Warned Us About

Confidence Without Competence: The Dangerous Asymmetry

The Training Data Problem That Haunts Every Model

What This Means For Everyone Using AI Right Now

The Path Forward Isn't Magic

Comments (0)

More from AI

Explore More Topics

Why AI Can't Stop Lying to Your Face (Even When It Wants To)

The Pattern Completion Problem That Nobody Warned Us About

Confidence Without Competence: The Dangerous Asymmetry

The Training Data Problem That Haunts Every Model

What This Means For Everyone Using AI Right Now

The Path Forward Isn't Magic

Comments (0)

More from AI

Why Your AI Chatbot Keeps Making Confidently Wrong Answers (And How to Fix It)

Why Your AI Chatbot Keeps Giving You Weirdly Specific Advice About Penguins

Why Your AI Chatbot Keeps Giving You Terrible Advice (And What Actually Works)

Explore More Topics