How AI Learned to Gaslight You: The Strange Rise of Hallucinating Language Models

Photo by Igor Omilaev on Unsplash

Last year, a lawyer in New York made headlines for submitting court documents citing cases that don't exist. The citations looked perfect. The formatting was pristine. The case names sounded entirely plausible. But they were fabricated—generated by ChatGPT, which had helpfully invented them with complete confidence. The judge was not amused.

This wasn't a glitch or a rare mishap. It was a feature.

Language models don't "know" things the way humans do. They predict what word should come next based on probability. When a model predicts tokens—the tiny chunks of text it breaks language into—it's playing an extraordinarily sophisticated guessing game. And sometimes, the most probable next token is a complete fiction that sounds exactly like what should be there.

We call this phenomenon "hallucination," though that term undersells how bizarre and systematic it actually is.

Why Your AI Doesn't Realize It's Making Things Up

Here's what most people don't understand: hallucinations aren't mistakes from the model's perspective. They're the correct output given how these systems work.

Consider a simpler version of the problem. If I ask you to complete the sentence "The capital of France is," your brain retrieves the word "Paris" from memory. But if I ask you to complete "In the year 2847, the capital of France will probably be," your brain generates something plausible based on patterns it knows. You're not retrieving; you're creating. Language models operate almost exclusively in that generation mode.

When you ask GPT-4 to cite a study about coffee and productivity, the model has been trained on millions of real studies. It understands the structure of citations. It knows what plausible study titles and author names look like. When it generates a citation, it's doing what it does best: predicting statistically likely text sequences. Sometimes those sequences correspond to real papers. Often they don't—but they're still the statistically probable next tokens given the context.

The model has no mechanism for saying "I don't know." It was trained on text that continues confidently, so it continues confidently. A model that constantly said "I'm uncertain about this" or "I don't have information on that" wouldn't match the patterns in its training data very well.

The Scale Problem Nobody Wants to Admit

Here's where it gets uncomfortable for AI companies: the bigger the model gets, the more coherent the hallucinations become.

A smaller model might invent a fact and contradict itself three sentences later. You'd catch it immediately. But GPT-4 or Claude 3 can invent an entire fictional framework—a made-up researcher, their supposed methodology, their fabricated findings—and maintain internal consistency across paragraphs. The hallucination is so well-woven that it becomes more persuasive than the truth.

This isn't a scaling law that gets better with more compute. It's a scaling law that gets worse. Researchers at Stanford and other institutions have documented that as models grew from billions to hundreds of billions of parameters, hallucination rates didn't decrease proportionally. In some cases, they actually increased because the models became better at sounding right while being confidently, completely wrong.

As you might expect, the problem of confidently incorrect AI models reveals deeper brittleness issues that developers are still scrambling to address.

What Actually Happens in the Model's Math

Let's talk about attention mechanisms—the architecture that powers modern language models.

When the model processes text, each token attends to previous tokens, creating a kind of probabilistic map of relevance. These attention weights help determine what the model "focuses on" when predicting the next word. But here's the crucial bit: this system optimizes for prediction accuracy on training data, not for factual accuracy in the real world.

A model trained on 10 trillion tokens has seen patterns where certain sentences follow certain statements. It's learned that professional language, proper formatting, and confident tone all correlate with "true" content. When it generates text, it uses those same patterns. The result: fabricated content that passes every stylistic test for legitimacy while failing the actual factuality test.

The model isn't lying. It has no concept of lying because it has no world model—no internal representation that says "X is true in reality." It has probability distributions. One of those distributions just happens to produce highly plausible fiction.

What Can Actually Be Done About This

The honest answer is: we don't have a silver bullet yet.

Some approaches show promise. Retrieval-augmented generation (RAG) systems that pull from verified databases before generating text can reduce hallucinations significantly. Ensemble methods that compare multiple model outputs and flag inconsistencies catch some fabrications. Fine-tuning on high-quality, fact-checked data helps—but it's expensive and labor-intensive.

The most practical solution for most organizations right now? Don't use language models as oracles. Use them as brainstorming partners. Chain them with verification steps. Have humans fact-check critical outputs. Treat them like very eloquent students who sometimes confidently say complete nonsense—useful for ideation, dangerous for critical decisions without oversight.

Some companies are experimenting with teaching models to express uncertainty through special tokens or confidence scores. Others are developing specialized architectures that maintain explicit knowledge graphs. OpenAI and Anthropic have both invested in constitutional AI approaches that try to align model outputs with human values.

But fundamentally, the problem runs deep. You can't solve hallucination without addressing the fact that these models were never designed to have a relationship with truth. They were designed to predict text. That they do it so convincingly is both their greatest strength and their most dangerous weakness.

The Future Is Uncomfortable Questions

We're entering an era where the most capable language models are also the most persuasive liars. They'll improve. They'll become harder to detect. At some point, we might build systems that genuinely maintain factual accuracy across complex queries.

But that day isn't here yet. And until it is, we need to stop treating hallucinations like a bug waiting for a patch. They're a feature of how these systems fundamentally work. The better we understand that, the safer we can be in deploying them.

The lawyer with the fake cases learned this the hard way. The rest of us would be wise to learn it differently.

How AI Learned to Gaslight You: The Strange Rise of Hallucinating Language Models

Why Your AI Doesn't Realize It's Making Things Up

The Scale Problem Nobody Wants to Admit

What Actually Happens in the Model's Math

What Can Actually Be Done About This

The Future Is Uncomfortable Questions

Comments (0)

More from AI

Explore More Topics

How AI Learned to Gaslight You: The Strange Rise of Hallucinating Language Models

Why Your AI Doesn't Realize It's Making Things Up

The Scale Problem Nobody Wants to Admit

What Actually Happens in the Model's Math

What Can Actually Be Done About This

The Future Is Uncomfortable Questions

Comments (0)

More from AI

Why Your AI Chatbot Keeps Giving You Weirdly Specific Advice About Penguins

Why Your AI Chatbot Keeps Giving You Terrible Advice (And What Actually Works)

Why Your AI Chatbot Keeps Making Confidently Wrong Answers (And How to Fix It)

Explore More Topics