Everyone Blamed the AI

Three of the Big Four pulled AI-written reports in eight months. The headlines blamed the AI. The failure underneath is older than that.

Author
Sushee Nzeutem, SVRNOS
Related
Governance Error Register · Non-Content Safety Attestation
A 1990s beige CRT computer beside a modern laptop showing a chat window on the same desk, with a yellow Post-it note in front reading: it's not me, it's the computer.

In eight months, three of the Big Four have had to withdraw, remove, or correct a report after investigators found fabricated or distorted citations in it. In October, Deloitte agreed to partially refund the Australian government for a report that cited academic papers nobody wrote, then reissued a corrected version. In May, EY Canada removed a study on loyalty fraud. This June it was KPMG, whose report on agentic AI described an Emirates chatbot named Sara that could rebook your flight. Sara is a physical check-in robot Emirates introduced in 2023, not the mobile chatbot the report described.

The headlines reached for the obvious villain. AI hallucinations. AI can’t be trusted. A cautionary tale about artificial intelligence in the workplace. It is the comfortable story, because it puts the failure on the tool and lets the firm stand beside it as an early casualty of a powerful new technology.

We have heard this before. In the nineties, the all-purpose alibi was “it’s not me, it’s the computer.” The file got eaten, the spreadsheet was wrong, the system was down. Sometimes the machine really did fail. Often, it was the nearest place to put a mistake. “The AI hallucinated” is that same sentence, thirty years on, with a better vocabulary.

The firms themselves said otherwise. Here is KPMG’s response: “We expect all our people to follow our guidelines on the responsible use of AI, including human oversight to validate content and verify independent sources.” Read it twice. A guideline existed. People were told to validate content and check their sources. What the public record does not show is a control that made the check unavoidable: an owner, a gate, a record that it ran. Only five of the report’s forty-five citations accurately pointed to the sources they claimed to support, which is not what oversight produces when it runs. The result went out under the firm’s name.

Strip the AI out and the failure is identical. A junior analyst could have faked those citations by hand in 1995, and a firm putting its name on work nobody checked is the oldest failure in the trade. That is what this is. Not an AI governance failure. A failure to do the checking.

If KPMG had an AI-specific control that failed, an undisclosed-use rule, a logging requirement, a gate on model output, that would be a real governance question. Nothing in the public record shows one. The fabricated footnote, the source that says nothing like what you claimed, the number that appears twice for two different things, none of it is new. What AI changed is the speed and the finish. The output looks done. A made-up reference is formatted exactly like a real one, so the distance between work that was checked and work that only looks checked stopped being visible to the eye. The subtler ones were not even fabricated. GPTZero, the firm that examined the report, has a name for them: vibe citations, references paraphrased or recombined from real sources so they read as genuine. Nothing about one looks wrong until you follow it back to where it came from. Checking was always what closed that distance. Checking is what got skipped.

The name you give a failure decides what you fix, which is why this matters. “AI is unreliable” leads nowhere. You shrug, you add a disclaimer, you wait for a better model. “We published work nobody verified” leads somewhere you can act: verification becomes a step that runs on every claim, every time, and leaves a record that it ran. One of those sentences has a repair attached. The other leaves the failure exactly where it was.

A two-branch diagram titled "The name you give a failure decides what you fix." One incident, unverified claims published as fact, forks two ways. The left branch, labelled "AI is unreliable," runs through add a disclaimer and wait for a better model to a dead end marked failure unchanged. The right branch, labelled "we published work nobody verified," runs through define the control, assign an owner, make it a release gate, and keep the record, to failure fixed.

There is a real discipline called AI governance, and it has a real subject: whether an AI system or an AI-assisted operation stayed inside the authority it was given, whether the controls on it actually ran, and whether anyone can prove how it was governed. The public record here answers none of those. It does not name the model, the control, the owner, or whether any gate was ever wired in. What it establishes is plainer: a firm published work it had not checked. A failure that would read the same with no AI in the room is not an AI failure. Calling it one is how the phrase comes to mean nothing, and how the genuinely hard problems get buried under stories about robots that never rebooked anyone’s flight.

The Big Four trade on verification. Their names tell a client that somebody checked. The lesson is about that promise: it only holds if the checking actually runs, and a guideline everyone agrees with and no one performs is just a sentence in a handbook. A stated expectation is not an executed control. The difference is the record.

Blaming the AI is the easy exit. It asks nothing of you. The harder sentence, and the true one, is “we stopped checking our own work.” That one stings, and it is also the only version you can fix.

See the governance research