How AI Learned to Gaslight Us (And Why That Should Terrify You)

Photo by Solen Feyissa on Unsplash

Last Tuesday, a lawyer in Manhattan filed a brief citing six cases that don't exist. The citations looked real. They had case numbers, court names, and specific holdings. The AI that generated them sounded absolutely certain. This wasn't a fluke—it was a preview of a much bigger problem that nobody's talking about enough.

When you ask a language model a question it doesn't know the answer to, something fascinating happens. It doesn't say "I don't know." Instead, it generates plausible-sounding text that sounds like an answer. It confidently invents sources, statistics, and details. Researchers have started calling this "hallucinating," which is a weirdly gentle term for making stuff up and presenting it as fact.

The Confidence Problem Nobody Warned Us About

Here's what makes this genuinely dangerous: the hallucinations don't *sound* like hallucinations. They sound like knowledge. If you ask ChatGPT who won the 1987 World Series, it will tell you the Minnesota Twins—which is correct. But if you ask it something more obscure, something at the edge of its training data, it will generate an answer with the exact same tone of certainty. You can't hear the difference.

This is worse than human confusion because humans tend to hedge. A person who isn't sure about something will usually signal that uncertainty through language. They'll say "I think," or "I'm pretty sure," or "I could be wrong." AI models don't do this naturally. They learned to predict the next word in a sentence, and apparently, uncertain language is statistically less common in their training data than confident-sounding answers.

The lawyer's brief wasn't an isolated incident. In March 2023, lawyers at Levidow & Levidow submitted a motion citing entirely fabricated cases to a federal judge. They'd used ChatGPT and trusted what it told them. The judge wasn't amused. There have been dozens of similar incidents since then—people relying on AI for legal research, medical information, academic citations, and job applications. Each time, the confidence was the villain.

Why Your Brain Is Vulnerable to This

We're not wired to distrust confident-sounding statements. Evolutionarily speaking, confidence was usually a signal that someone knew what they were talking about. If your ancestors were uncertain about where the water was, they died of thirst. Certainty was survival.

Now we're interacting with systems that have weaponized that ancient instinct. They're trained on billions of words written by humans, most of which assume the writer knows something about their topic. When you combine that training data with the statistical fact that confident language is more common than hedged language, you get a system that *never shuts up about things it doesn't actually know*.

And we fall for it every time. Studies show that people are more likely to trust information presented confidently, even when they've been explicitly warned that the source might be unreliable. Add in the halo effect of "it's an AI, it must be smart," and you've got a recipe for disaster.

The Training Data Trap

The root of this problem goes back to how these models are built. Large language models are trained by showing them trillions of words and having them learn to predict the next word in a sequence. They're getting absurdly good at this task. But "good at predicting text" and "reliably truthful" are not the same thing.

When you train something on the internet, you're training it on everything the internet has to offer. That includes gossip, speculation, false claims, conspiracy theories, and just regular human mistakes. The model doesn't have a way to distinguish between these and actual facts. It learns that confident-sounding claims are common, so it generates confident-sounding claims.

Some companies have tried to fix this. OpenAI added a retrieval system to ChatGPT so it can actually look up current information rather than relying on training data from 2021. Google's Bard does something similar. But these fixes are incomplete. They work better for some domains (sports scores, current events) and worse for others (specialized knowledge, nuance, context).

The fundamental issue remains: these systems are performing a very specific task—predicting the next word—and we're asking them to do something completely different—tell us the truth about reality. It's not surprising that they're not great at it.

What Happens When Everyone Uses This Anyway

Here's the dark part. People are using these tools for high-stakes decisions right now. Students are submitting essays written by AI. Doctors are considering treatment options suggested by language models. Journalists are using them for research. We haven't solved the hallucination problem, but we're rolling these systems out at scale anyway.

There's a perverse incentive structure here. If you're a company building an AI chatbot, admitting that your system frequently makes things up isn't great for marketing. So companies minimize the severity of the problem, release disclaimers in fine print, and hope users figure it out on their own. Some do. Most don't.

The really insidious part is that as these systems get better at mimicking human language, they get *better* at seeming credible even when they're completely wrong. A 2024 study found that people's trust in AI-generated text actually increased when the AI sounded more natural—even when the text contained more errors. We're training ourselves to be gullible.

What Actually Needs to Happen

Fixing this isn't simple, but it's not impossible either. First, we need AI systems that can actually signal uncertainty. If a model doesn't know something, it should say so in a way that users understand and respect. This is harder than it sounds because confident language is baked into how these systems work.

Second, we need regulations that hold companies responsible when their systems cause harm. Right now, there's a gray area where everyone assumes the user should have known better. That's changing, slowly. The SEC is starting to care about AI hallucinations in corporate disclosures. Courts are starting to penalize lawyers who use ChatGPT without fact-checking.

Third, and most importantly, we need to change our own behavior. The default assumption should be skepticism. If an AI tells you something important, verify it. Check the sources. Look it up independently. This feels like extra work, and it is. But the alternative is building a world where false information is indistinguishable from true information—and that's not the future anyone actually wants.

For more on how AI models can seem credible even when they're completely wrong, check out Why AI Keeps Hallucinating and Why We're Still Not Close to Fixing It.

The technology isn't going away. But our relationship with it—that's still being written. The question is whether we'll be thoughtful about what we're building, or whether we'll just keep trusting confident-sounding machines until they break something important.

How AI Learned to Gaslight Us (And Why That Should Terrify You)

The Confidence Problem Nobody Warned Us About

Why Your Brain Is Vulnerable to This

The Training Data Trap

What Happens When Everyone Uses This Anyway

What Actually Needs to Happen

Comments (0)

More from AI

Explore More Topics

How AI Learned to Gaslight Us (And Why That Should Terrify You)

The Confidence Problem Nobody Warned Us About

Why Your Brain Is Vulnerable to This

The Training Data Trap

What Happens When Everyone Uses This Anyway

What Actually Needs to Happen

Comments (0)

More from AI

Why Your AI Chatbot Keeps Making Confidently Wrong Answers (And How to Fix It)

Why Your AI Chatbot Keeps Giving You Weirdly Specific Advice About Penguins

Why Your AI Chatbot Keeps Giving You Terrible Advice (And What Actually Works)

Explore More Topics