In 2025, researchers at the University of Maryland, National University of Singapore, and Ohio State ran a controlled study on what happens when an AI system evaluates candidates for jobs. The finding: every major model they tested preferred candidates generated by the same vendor. GPT-4o picked GPT-4o-written resumes 81.9% of the time. LLaMA-3.3 picked LLaMA-written ones 78.9% of the time. The bias held after controlling for content quality. This is GER-430, Evaluator-Generator Entanglement.
What a 430 Is
GER-430, Evaluator-Generator Entanglement: the same AI vendor's models generate content and evaluate it, producing systematic bias toward their own output with no governance layer to detect or correct the conflict.
HTTP 430 is unassigned in the IANA registry with no defined meaning. SVRNOS assigns it to Evaluator-Generator Entanglement — the first definition this code has ever had. It sits in the 4xx operator/platform error tier because the entanglement is a deployment choice, not a model failure. The full taxonomy is in the SVRNOS Governance Error Register.
The Study
Xu, Li, and Jiang tested 4 frontier models across 24 occupations, 1,760 evaluation tasks. The setup: present each model with resumes written by different AI systems and ask it to rank them. Then control for actual content quality using independent human annotation and conditional logistic regression. The full paper is available on arXiv.
The same-vendor preference didn't disappear when quality was held constant. GPT-4o at 81.9%. LLaMA-3.3-70B at 78.9%. Qwen-2.5-72B at 78.0%. DeepSeek-V3 at 71.6%. In a labor market simulation, same-vendor candidates were 23 to 60% more likely to be shortlisted across the 24 occupations tested.
The models score patterns that match their own output style higher, and that preference holds after controlling for content quality via independent human annotation.
Why This Is a Governance Failure
A model preferring its own output is a predictable property of how language models work: trained to predict and generate text in their own style, they recognize that style when evaluating. Deploying that model as an evaluator in a context where it's also the generator, without any structure to detect or correct the conflict, is an architectural choice made at the deployment layer.
The entanglement applies anywhere an AI system generates content and evaluates it: grant applications reviewed by AI, student work assessed by AI. The vendor whose model evaluates has a structural advantage over every other vendor, and over every human, in that evaluation.
This is a different failure shape from GER-404 — Replika Had No Rule for This, where no governance rule existed at all. In a 430, a governance layer may exist — the failure is that it wasn't designed to detect same-vendor bias as a conflict worth flagging.
Sango Guard is governance middleware that sits between your model and your users. It is the architectural layer where conflict detection of this kind would be implemented — runtime classification of what the system is doing against what it should be doing.
No Named Incident Yet
This code has empirical evidence but no documented operator incident. The research establishes the failure mode at scale. A specific company whose AI hiring system demonstrably advantaged same-vendor candidates hasn't been publicly confirmed yet.
That gap won't last. The research was published in 2025, and EU AI Act enforcement starts August 2, 2026 with hiring AI explicitly in scope. The first enforcement action that pulls logs on a same-vendor generate-and-evaluate pipeline will produce the incident this code is waiting for.
Submit a real-world instance. If you have witnessed or documented a real-world instance of a 430 — Evaluator-Generator Entanglement — or any other code in the register — email contact@svrnos.com with the subject line: Taxonomy Contribution — 430. See the full register for all codes.