How AI Learned to Sound Confident While Being Completely Wrong

Photo by Immo Wegmann on Unsplash

Last week, I asked ChatGPT who won the 1987 World Series. It told me confidently that the Minnesota Twins took it home. The answer came wrapped in specific details about their pitcher, a narrative arc, the whole package. There's just one problem: the Minnesota Twins won the World Series in 1991, not 1987. (That was the Minnesota Twins' year—1987 belonged to the St. Louis Cardinals.) ChatGPT didn't hesitate. It didn't say "I'm not sure." It didn't hedge. It lied with the casual authority of someone who'd actually been there.

This isn't a bug. It's a feature, baked into how these systems fundamentally work.

The Architecture of Confidence

Here's what happens under the hood when you ask an AI model a question: it doesn't retrieve facts from a database the way Google does. Instead, it predicts the next word that should probably come next, based on patterns it learned during training. Then the next word. Then the next. Like autocomplete on your phone, except trained on hundreds of billions of words and running at scale.

The problem is obvious once you think about it. The model has learned what probable-sounding text looks like. It has not learned which claims are actually true. These are different skills entirely. A statement can be grammatically perfect, contextually coherent, and completely fabricated. The model has no internal "truth meter." It just knows what looks like good writing.

Claude Maley, a researcher at MIT, described it perfectly in a 2023 paper: "Language models are next-token predictors. They're not fact-retrievers. When they encounter a token sequence about a topic they don't have strong training signal for, they'll still generate the next token—and that token will probably follow the patterns of confident writing."

Think about your own writing. When you're confident about something, you write simply and directly. When you're uncertain, you hedge. You add qualifiers. "I think," "possibly," "from what I understand." Language models learned these patterns. And they learned that direct, unhedged writing gets higher scores during training because it matches the style of authoritative sources. So they learned to write with confidence, regardless of whether they actually know the answer.

Why Confidence Is Actually Rewarded

During training, these models are scored on how well they predict text. They're not scored on accuracy. A model that confidently generates perfectly-spelled, grammatically correct nonsense will score higher than a model that says "I don't know." The reward structure punishes uncertainty and celebrates fluency.

It gets weirder. OpenAI, Anthropic, and other labs use something called reinforcement learning from human feedback (RLHF) to fine-tune these models after initial training. Humans rate model outputs, and the model learns to generate outputs that humans prefer. And humans, it turns out, really prefer confident-sounding answers. A response that says "The capital of France is Paris" rates higher than "I believe the capital of France is probably Paris, though I should note this is based on my training data which has a knowledge cutoff date."

So we've accidentally built systems that are rewarded for sounding right, not for being right.

The Real-World Consequences

This isn't a theoretical problem. Lawyers have already used ChatGPT to cite cases that don't exist. Researchers have quoted AI-generated papers that were entirely fabricated. A doctor could ask an AI system about a rare drug interaction and receive a confident-sounding answer that's completely wrong—potentially endangering a patient.

Even more insidious: when you ask these systems to cite their sources, they'll often cite real papers that don't actually support what they just said. They'll make up author names that sound plausible. They've learned that citations *look like* they make claims more credible, so they generate them, completely decoupled from whether the citations actually exist or support the argument.

The researchers at UC Berkeley tested this specifically. They found that when prompted to cite sources, ChatGPT was actually *worse* at accuracy than when it wasn't prompted to cite sources. The citation requirement didn't make the model think harder. It just trained it to generate more plausible-looking hallucinations.

To understand just how deep this goes, consider reading Why AI Keeps Hallucinating About Facts It Should Know (And Why Your Fact-Checker Can't Catch It), which breaks down the mechanisms behind these failures in technical detail.

What Actually Helps

Some solutions exist. Retrieval-augmented generation (RAG) is a technique where models are given access to actual documents or databases before generating answers. Instead of predicting from memory, they can pull relevant facts and build responses on that foundation. It's less elegant than pure language models, but it's dramatically more accurate.

Other approaches include training models with explicit uncertainty signals. Teaching them that "I don't know" is sometimes the right answer. This lowers their average confidence scores but increases their reliability. It's the academic equivalent of teaching someone that admitting ignorance is better than bullshitting.

Constitutional AI, developed by Anthropic, tries to bake in principles about honesty and accuracy during training. It's incremental progress, not a solution. The fundamental tension remains: models that are trained primarily to sound good will sometimes sound good while being wrong.

The Honest Assessment

We should use these models for what they're good at. Brainstorming. Explaining concepts. Generating code that a human then reviews. Copy-editing. Anything where confidence and fluency matter more than factual accuracy. For anything where you need reliable information—medical decisions, legal matters, financial advice, historical facts—verify with proper sources.

The uncomfortable truth is that we've built systems so good at sounding authoritative that many people treat them like oracles. They're not. They're very sophisticated pattern-completion machines that have learned our writing conventions, including the convention that authorities never admit doubt.

The model doesn't know it's guessing. It just knows what confident-sounding guesses look like.

How AI Learned to Sound Confident While Being Completely Wrong

The Architecture of Confidence

Why Confidence Is Actually Rewarded

The Real-World Consequences

What Actually Helps

The Honest Assessment

Comments (0)

More from AI

Explore More Topics

How AI Learned to Sound Confident While Being Completely Wrong

The Architecture of Confidence

Why Confidence Is Actually Rewarded

The Real-World Consequences

What Actually Helps

The Honest Assessment

Comments (0)

More from AI

Why Your AI Chatbot Keeps Making Confidently Wrong Answers (And How to Fix It)

Why Your AI Chatbot Keeps Giving You Weirdly Specific Advice About Penguins

Why Your AI Chatbot Keeps Giving You Terrible Advice (And What Actually Works)

Explore More Topics