Photo by Nahrizul Kadri on Unsplash

Last week, I asked Claude to tell me about the first person to summit Mount Everest. It confidently told me it was Sir Edmund Hillary in 1953. When I asked a follow-up question about his climbing partner, it added details about their expedition route, the equipment they used, and even quoted Hillary's diary entries. Everything sounded authoritative. Everything was accurate. So I asked it the same question again using slightly different wording.

This time, it gave me a completely different answer—one that was entirely fabricated. The model invented a fictional mountaineer, complete with a plausible backstory and climbing achievements. When I pressed for sources, it cited publications that don't exist.

This isn't a bug. This is how these systems actually work.

The Confidence Problem

The core issue with modern AI models isn't that they sometimes fail to know things. It's that they're fundamentally incapable of distinguishing between what they actually know and what they're generating based on statistical patterns. When you feed a large language model a prompt, it's not retrieving information from a database. It's predicting the next token (basically, the next word or phrase fragment) based on probabilities learned during training.

Think of it like this: if you trained a model on millions of sentences about famous climbers, it learns that the pattern "Edmund Hillary climbed Mount Everest" appears frequently in text. So when you ask about Everest's first climber, the model generates "Edmund Hillary" because that's the statistically likely next token. But the same training process also creates associations that aren't true. Maybe the text also contained sentences like "In 1953, legendary explorer James Whitmore attempted the first Everest ascent." The model learned to associate 1953 with Everest first ascents, so it might confidently generate false climbers from that era.

What's particularly sinister is the confidence. A human who doesn't know something typically expresses uncertainty. An AI model? It generates text with the same confident tone regardless of whether the information is real or fabricated. There's no internal voice saying "I'm not sure about this." There's only probability distributions collapsing into tokens.

Why This Is Harder to Fix Than You'd Think

You might assume the solution is simple: just give the AI access to the internet so it can look up facts. Some companies have tried this. It helps, but it doesn't solve the fundamental problem. Even with internet access, models can hallucinate citations to sources that don't exist. They can misread search results. They can fabricate webpage URLs that sound plausible.

The real challenge is architectural. These models work through pattern matching at scale. You can't just bolt on a "fact-checking module" because the hallucination isn't a separate malfunction—it's woven into how the system generates language. When researchers have tried to train models to be more cautious, to hedge their claims, or to say "I don't know," something peculiar happens. The model becomes less useful overall. It hedges on things it actually does know. It becomes so cautious it refuses to answer anything interesting.

OpenAI's newer models use a technique called reinforcement learning from human feedback (RLHF) to make models more honest. But even this has limitations. You can teach a model to avoid certain types of hallucinations, but you can't teach it to know things it never learned. And you can't make it actually aware of its own knowledge boundaries because it doesn't have those boundaries to begin with—it has probability distributions.

The Real-World Consequences

This matters because these systems are increasingly being used in high-stakes situations. Researchers have documented instances of legal professionals citing cases that ChatGPT invented. A lawyer used fabricated case citations in court filings. Medical professionals have received confidently-stated medical advice that sounds plausible but contradicts established practice. Students have submitted essays containing false facts generated by these models, facts that sounded authoritative enough to fool both the AI that generated them and the human reading them.

What makes this particularly insidious is that it's not random. The hallucinations cluster around topics where training data is limited, ambiguous, or contradictory. Recent events, niche subjects, local information, and emerging research are all high-risk zones. But the confidence level doesn't change. A model's certainty about something made up looks identical to its certainty about something true.

As of now, the best defense is treating AI outputs as drafts that require verification. If you need factual accuracy, you need a human to check the work. This is the opposite of what these systems promise—they're supposed to save time by automating information retrieval. But with hallucinations, they sometimes do the opposite. They create confident-sounding nonsense that takes additional time and expertise to verify or debunk.

What's Next?

There's active research into better approaches. Some teams are exploring retrieval-augmented generation, where models are trained to actually pull information from databases rather than generating it from patterns. Others are working on uncertainty quantification—making models actually output their confidence levels rather than just sounding confident. A few researchers are investigating whether models can learn to recognize their own knowledge boundaries.

But these are fundamental challenges. They're not going away soon.

If you're using AI tools regularly, keep this in mind: fluency is not accuracy. A model that generates smooth, well-structured sentences about a topic might be completely wrong. The inverse is also true—careful, hedged language doesn't necessarily indicate caution; it might just be the statistically likely phrasing given your input. This connects to a broader issue with how these models handle truth. If you want to understand more about how AI systems approach factual claims, check out our article on why AI chatbots confidently lie and how to spot when they're making things up.

Until we solve the hallucination problem at an architectural level, the safest approach is simple: verify important information. Don't assume confidence equals correctness. And if you're relying on AI for factual content, build verification into your workflow. Because right now, that's the only real safeguard we have.