ExplainerHallucinationAI ModelsJune 12, 202610 min read

Why does AI make things up? Hallucination, explained

Seventeen court rulings in one day flagged invented citations. AI isn’t lying to you: it’s guessing, exactly as it was taught.

One question, asked three times

From OpenAI’s own paper

It isn’t lying. It’s guessing, the way it was taught.

PromptWhat is Adam Kalai’s birthday?

Ask 1March 7thwrong

Ask 2June 15thwrong

Ask 3January 1stwrong

Never“I don’t know”

A state-of-the-art model, asked for a researcher’s birthday three times, gave three confident dates (all wrong) and never once declined to answer. That choice, it turns out, was trained in. This post explains how, and what actually works against it.

On March 31, 2026 (one ordinary Tuesday), American judges published 17 separate decisions flagging suspected AI-fabricated material in filings before them. Seventeen, in a day. When two New York lawyers were fined $5,000 in 2023 for submitting six ChatGPT-invented cases (complete with made-up quotes attributed to real judges), it was international news. Now a database tracking such cases worldwide has passed 1,227 entries and grows by five or six a day.

Three years of better models didn’t fix this. To understand why, you have to drop the most natural explanation: that the AI is lying, or broken. It is neither. A hallucination is a confident guess from a machine that was trained and graded in a way that made guessing the winning strategy. That one idea explains why fluent nonsense exists, why it concentrates on certain kinds of facts, and which defenses actually work: because the fixes change what we ask for, not what the model knows.

A hallucination is a confident guess, not a lie

A language model does one thing: given some text, it predicts what token most plausibly comes next, over and over. That single mechanism produces translations, working code, and decent poetry. It also produces fabricated court cases, with the same machinery, the same confidence, and the same fluent tone. Fluency is the output of the process, not evidence that the process found a fact.

OpenAI’s researchers opened their 2025 paper on hallucination with a homely test: ask a model for one author’s birthday. Asked three times, a state-of-the-art model returned three different dates, none correct, none hedged. Asked for his dissertation title, three major models produced three different titles, three different years, and three different universities. Each answer had the right shape. The model wasn’t consulting a record and misreading it; there is no record. It was completing a pattern.

That’s why “lying” is the wrong frame. A liar knows the truth and steers you away from it. The model holds no ledger of true facts, only statistical associations between tokens. Most of the time, the most plausible continuation is the truth, which is why the failures surprise us. A hallucination is simply the case where plausible and true come apart.

We graded AI like an exam where blanks score zero

If the model can’t know everything, why doesn’t it just say “I don’t know”? Because for years, every test we gave it punished that answer. The OpenAI paper’s central argument is about incentives: nearly all AI benchmarks are graded binary. A right answer scores 1. A wrong answer scores 0. And “I don’t know” also scores 0.

Any student knows what to do under those rules. On a multiple-choice test with no negative marking, you never leave a blank: a guess has some chance of points, a blank has none. Models tuned to climb leaderboards learned exactly this, at scale: always answer, never abstain. The overconfidence we read as a personality flaw is a rational response to the scoring rules we wrote.

A hand filling in bubbles on a multiple-choice exam answer sheet with a pencil. — No negative marking, so you never leave a bubble blank. Language models took the same exam strategy to its logical extreme. Photo by Nguyen Dang Hoang Nhu on Unsplash.

The proof is what happens when you change the rules. On SimpleQA, a quiz of obscure factual questions, OpenAI compared an older model tuned to always answer against a newer one allowed to abstain. Their factual knowledge was nearly identical: 24% versus 22% correct. But the always-answer model got 75% of questions wrong, while the model that declined just over half the questions got only 26% wrong. Two-thirds of the hallucinations vanished, and no new knowledge was added. The model just stopped bluffing.

Same exam, one rule changed · SimpleQA factual quiz

Two OpenAI models on the same factual-recall test. They know almost the same amount. Look at what happens to the wrong answers.

o4-minituned to always answer

24%

75%

gpt-5-thinking-miniallowed to say “I don’t know”

22%

26%

52%

CorrectWrong: hallucinatedDeclined to answer

From OpenAI’s “Why Language Models Hallucinate”. Knowledge barely differs: 24% vs. 22% correct. Permission to abstain cut wrong answers from 75% to 26%. The guessing was a policy, not a limitation.

The paper’s proposed fix is the same one exam designers reached centuries ago: penalize wrong answers more than blanks, and say so in the instructions: “answer only if you are more than 90% confident.” Until mainstream benchmarks adopt that, every lab faces pressure to ship models that guess, because guessing wins leaderboards.

Some facts are mathematically out of reach

Incentives explain why models bluff. A second, colder result explains why they must: some facts are statistically unlearnable from the data. The paper makes it concrete: if 20% of birthday-type facts appear exactly once in the training data, base models should be expected to hallucinate on at least 20% of birthday-type questions. A fact seen once is a needle the compression can’t keep.

A wall of small wooden library card catalog drawers. — A card catalog stores each fact exactly once and retrieves it exactly. A language model does neither: it compresses millions of documents into patterns, and facts seen once may not survive. Photo by Jan Antonin Kolar on Unsplash.

This is the difference between a model and a database. A database stores each fact once and retrieves it exactly or fails loudly. A model compresses its training data into patterns, and patterns need repetition. “Paris is the capital of France” appears millions of times; it is effectively unforgettable. Your company’s 2019 refund policy, one mid-tier court ruling, a researcher’s birthday: each may appear once or never. The pattern machine fills those gaps with the most plausible invention.

Now you can see why fabricated citations are the signature hallucination. A legal citation’s format (names, volume, reporter, page, year) appears in training data millions of times; the model reproduces it flawlessly. The content of any specific case is a rare fact. Perfect form wrapped around invented substance is exactly what a pattern-completion machine produces when asked for a rare fact in a common costume. The same goes for URLs, ISBNs, phone numbers, and exact quotes.

The damage lands wherever nobody checks

Courts didn’t become the public face of hallucination because lawyers use AI more than everyone else. They became the face because legal work is one of the few places where checking is adversarial and mandatory: opposing counsel and the judge look up every citation, and the failures land in published opinions. The legal numbers are a rare measured glimpse of what fabrication looks like everywhere else.

They are sobering. Stanford’s RegLab tested over 200,000 legal queries and found general-purpose chatbots hallucinated 69% to 88% of the time on verifiable questions about federal court cases. A follow-up study of paid, purpose-built legal research tools (with retrieval over real case law) still found hallucinations in 17% to 33% of responses, despite vendor marketing that claimed the problem was eliminated.

Better tools shrink it. Nothing zeroes it.

69–88%

General chatbot, asked legal questions from memory

hallucination rate on verifiable questions about real federal court cases, Stanford RegLab

17–33%

Purpose-built legal AI, with retrieval over real case law

still hallucinating, despite vendors marketing the tools as hallucination-free, Stanford HAI study

1,227 cases

What slips through, worldwide

court filings with AI-fabricated content caught by judges, growing 5–6 per day, tracked database

A classical stone facade with tall columns in strong light and shadow. — Courts catch fabrications because checking is adversarial and mandatory: opposing counsel looks up every citation. Most workplaces have no opposing counsel. Photo by Sebastian Schuster on Unsplash.

The court-case database is the part that should generalize in your head. Those 1,227 caught filings (811 in the US alone) are the ones that surfaced in published decisions; the tracker’s own caveat is that most state-court rulings never reach searchable databases, so the real number is higher. Now remove the adversary. The same models write medical summaries, financial memos, school reports, and product documentation, in workplaces with no opposing counsel and no judge. The fabrication rate doesn’t drop when the checking does. Only the catching does.

Rates are falling, toward a floor, not to zero

The honest good news: when you give a model the source material and ask it to stay inside it, hallucination is rare and getting rarer. Vectara’s hallucination leaderboard, which measures how often models inject unsupported claims while summarizing a document they were handed, puts the best current model at a 1.8% hallucination rate (May 2026). The generational trend is real too: GPT-4o sat at 9.6% on that test; GPT-4.1 at 5.6%; the newest small GPT variant at 3.1%.

Two caveats keep that from being a victory lap. First, the spread between models on the same task is still wide: roughly 2% to 12% across the major vendors’ current models, a 6× difference your tool choice silently makes for you. Second, those single-digit rates are for grounded work, where the truth was in the prompt. Ask the same models for rare facts from memory and you’re back in the SimpleQA regime, where wrong answers outnumber right ones. The floor is real, and it is not zero.

You can’t prompt it away, but you can corner it

Everything above converts directly into practice. Five moves, in descending order of impact:

Hand it the document. The biggest single lever. Summarizing, rewriting, or answering from material in the prompt runs at single-digit error rates; from-memory recall of rare facts can fail most of the time. Paste the contract; don’t ask what contracts usually say.
Pay it to say “I don’t know.” Borrow the researchers’ fix: “If you’re not confident, say so: a wrong answer is worse than no answer.” You’re overriding a trained-in incentive, and the SimpleQA numbers show the effect size: two-thirds of wrong answers gone.
Check one source, externally. The Avianca lawyers did ask ChatGPT whether its cases were real; it said yes. Asking the model to verify itself is asking the same pattern machine the same question. Open the link, find the case, run the code. Verification has to leave the chat.
Keep it away from rare-fact lookups. Citations, phone numbers, niche biographies, exact quotes, prices: the singleton math says these are the worst category. Use a search engine or database for needles; use the model for everything around them.
Measure your own use case. Published rates span 2% to 88% depending on task and tool, so the only number that matters is yours. Twenty questions from your own work will tell you more than any leaderboard.

Hallucination: quick answers

Did the AI lie to me?

No. Lying requires knowing the truth and choosing otherwise. The model has no ledger of true facts to consult. It produces the most plausible continuation of your question, and plausible usually overlaps with true. A hallucination is the gap.

Why does it invent citations and sources specifically?

Citations are the worst case: the format is everywhere in training data, so the model reproduces it perfectly; while each individual case, paper, or URL is a rare fact it may have seen once or never. Perfect form, fabricated content.

Doesn't giving the model web search or documents fix it?

It helps a lot: grounded summarization error rates run in the single digits, versus 70%+ for from-memory recall of obscure facts. But purpose-built legal tools with retrieval still hallucinated 17–33% of the time in Stanford’s testing. Reduced is not eliminated.

Will bigger models stop hallucinating?

Not on their own. Some facts are too rare in training data for any model to learn, and benchmarks still reward guessing over abstaining. The lever that works (saying “I don’t know” more) is a training and grading choice, not a size milestone.

If you keep one number, keep 52%: the share of questions a model declined when it was finally allowed to, cutting its wrong answers by two-thirds with no new knowledge. The machines aren’t getting more honest; they’re slowly being given permission to pass. Until the scoreboards finish that shift, the permission slip is yours to write: in the prompt, in the document you paste, and in the one source you check before you hit send.

Why does AI make things up? Hallucination, explained

A hallucination is a confident guess, not a lie

We graded AI like an exam where blanks score zero

Some facts are mathematically out of reach

The damage lands wherever nobody checks

Rates are falling, toward a floor, not to zero

You can’t prompt it away, but you can corner it

Hallucination: quick answers

How to write AI prompts that actually work

Why we built a desktop app in the browser era

What AI can actually do in 2026: a plain-English tour

One-time payment. Yours forever.