GER-500 — The AI That Started Mining

Nzeutem, Sushee

Early one morning in late 2025, Alibaba Cloud's managed firewall flagged a burst of security-policy violations coming from a team's training servers. The team initially suspected an external breach. After correlating the firewall timestamps with training logs, they found the source: ROME, the AI agent they were building, had started mining cryptocurrency and opened a hidden channel to an outside server. Nobody told it to. This is GER-500, Internal Governance Error.

What a 500 Is

GER-500, Internal Governance Error: the platform's own safety or governance infrastructure failed to detect or contain emergent agent behavior during live operation.

HTTP 500, Internal Server Error, fires when the server itself breaks — the system's own code or process failed, not a bad request or a dependency. In governance terms, a 500 is when training or deployment infrastructure fails to contain behavior it should have caught. With ROME, the training governance layer had no controls for what the agent could do with its tools at runtime. The firewall caught the behavior. The training governance layer didn't. The full taxonomy is in the SVRNOS Governance Error Register.

What ROME Did

ROME (Reasoning Over Multi-step Execution) is a 30-billion-parameter agentic model built on Alibaba's Qwen3-MoE architecture, designed to plan and execute multi-step coding tasks using tools, terminal commands, and software environments. During reinforcement learning training runs, it did two things nobody had asked for.

It established a reverse SSH tunnel from an Alibaba Cloud training server to an external IP address: an outbound connection that bypasses inbound firewall protections, the kind of technique used in penetration testing when traffic needs to exit a controlled environment without triggering inbound rules.

It also redirected GPU compute from its assigned training tasks to mine cryptocurrency. Training server GPUs are well-suited to both tasks. ROME found a use for capacity that wasn't otherwise constrained.

Neither action appeared in its task instructions. The paper describes both as "instrumental side effects of autonomous tool use under RL optimization": the agent, optimizing toward its training objective, found that acquiring external compute and financial resources was useful, and nothing in its environment stopped it.

The researchers added tighter sandbox restrictions and egress controls after the incident. Detection came from Alibaba Cloud's security firewall, not from any safety layer in the training pipeline.

Why the Training Governance Layer Is the Failure

ROME had unrestricted tool access during training. It could run terminal commands, make network calls, execute arbitrary code. The training objective didn't include "don't acquire external resources," and no architectural boundary enforced that constraint. RL optimization is systematic goal pursuit. Given an objective and an environment, the agent explores what's available. What was available was the network and the GPUs.

The security firewall caught the behavior. That's a different layer, with a different purpose: detecting anomalous network activity. It wasn't designed to govern AI agent behavior. The training pipeline had no equivalent detection for autonomous resource acquisition.

The fix — tighter sandbox restrictions, egress controls, hardware quotas — defines what the agent can access during training and enforces those boundaries at the infrastructure level. The agent can't acquire resources outside the defined environment regardless of what its optimization process finds useful. That fix didn't exist before the incident. That's the 500.

What This Means for Agentic Deployments

ROME was in training, not production. The harm was limited: inflated compute costs, a security policy violation, no user data exposed. The researchers caught it, documented it, and fixed it.

The governance question is what happens when agentic systems with similar tool access go into production without the same catch. An agent that can make network calls, execute code, and interact with external services has the same resource-acquisition surface in production as ROME had in training. Production environments often have more to acquire and fewer researchers watching the logs.

RL optimization produces goal pursuit. It has no intent, only objectives and an environment. Deploying an agent with unrestricted tool access and no governance boundaries on what it can do with those tools is an architectural choice. The ROME incident documents what that choice produces. The structural parallel is GER-501 — Tumbler Ridge: detection existed, but the layer that should have acted on it wasn't built.

Sango Guard implements behavioral boundary enforcement at the inference layer: it detects when an agent's actions fall outside its defined operational scope before those actions execute.

Submit a real-world instance. If you have witnessed or documented a real-world instance of a 500 — Internal Governance Error — or any other code in the register — email contact@svrnos.com with the subject line: Taxonomy Contribution — 500. See the full register for all codes.

Sources