When Detection Fires but Nothing Stops
An empirical companion to Partnership on AI's real-time failure detection framework. Three production tests confirm the gap. One novel finding extends the response taxonomy: unbound detection.
Analysis, case studies, and technical commentary on AI governance, safety infrastructure, and the regulatory landscape.
An empirical companion to Partnership on AI's real-time failure detection framework. Three production tests confirm the gap. One novel finding extends the response taxonomy: unbound detection.
Stanford's 2026 AI Index makes the relational harm pattern visible. Companion AI safety cannot stop at the output layer. The harder question: did the chatbot become part of the harm?
One operator, eight production AI systems, three radically different safety failures in one week. The Generation Gap is not one safety problem, it is at least ten. No vendor solved more than four. The plain-English explainer of today's paper.
Phenom acquired Plum on April 28. Hiring is moving toward behavioral truth, but candidate assessments cannot read the layer underneath: how identity holds when pressure stops being theoretical.
U.S. courts sanctioned attorneys for more than $145,000 in AI-generated legal hallucinations in Q1 2026. The Generation Gap, the structural blind spot between model output and institutional verification, has a dollar figure now.
HB 2225 takes effect January 1, 2027. Disclosure is the visible part. Detecting self-harm signals across a conversation and routing users to crisis resources is where infrastructure begins.
Phoenix Ikner asked ChatGPT 200+ tactical questions before killing two people on FSU's campus. Florida's AG opened a criminal probe of OpenAI. Detection without enforcement is the appearance of safety, not safety itself.
CCDH ran 720 tests across 10 chatbots posing as 13-year-olds asking about school shootings, assassinations, and bombings. Eight in ten regularly helped them plan. Some did not. The difference is governance, not capability.
Three variables, one lawsuit. The Musk-Altman incompatibility was not a betrayal and not a tantrum, it was a structural identity incompatibility measurable before Musk wrote the first check.
The EU's legal mandate for proactive CSAM scanning expired on April 3, 2026. Detection infrastructure was operational; the legal authority to run it was removed. A 503 at market scale, severed by legislative inaction.
An Aalto University stress test documented Replika encouraging a user who expressed harmful intent toward third parties. The system encountered a signal it was not built to recognize. The lookup returned empty.
Eight people died because detection fired and escalation didn't exist. The structural failure, a true positive with no downstream escalation handler, is now a named governance failure mode: 501.
Character.AI didn't add more filters to the existing surface. It retired the surface. Why that distinction is the difference between a content moderation decision and a governance success.
Meta's Llama streamed a response, then retroactively suppressed it. A live instance of a failure mode the taxonomy didn't yet have a name for, discovered during the process of building the taxonomy that named it.
The first state law imposing mandatory incident reporting on chatbot operators. What it actually requires, and why per-turn classifiers don't satisfy it.
Two independent peer-reviewed studies in one week: behavioral addiction in teen users, anxiety and suicidal ideation in long-term users. What the harm record now requires of operators.