The Cuckoo’s Egg, Rewritten for the AI Era

How a decades-old metaphor explains what Anthropic just exposed

Nov 17, 2025

A couple of days ago, Anthropic released a report that many people missed. Back in mid September, they caught a Chinese state sponsored group running what appears to be the first documented cyber espionage campaign where an AI system handled almost the entire operation on its own.

In 1989, Clifford Stoll uncovered one of the first major computer intrusions and turned it into the book The Cuckoo’s Egg1. The metaphor was simple. A cuckoo lays its egg in another bird’s nest, hiding something foreign inside a system that does not know how to recognize the threat. We are watching a version of that story play out again, but this time the nest is an AI system. A state actor slipped a harmful workflow into a model that was not built for this kind of misuse, and it quietly grew into a full intrusion before anyone caught it. The environment was unprepared then, and it is unprepared now.

Anthropic’s investigation makes this clear. The group, labeled GTG 1002, did not use Claude Code as a helper. They used it as the operator. Claude scanned networks, mapped internal systems, identified vulnerabilities, wrote and validated exploit payloads, harvested and tested credentials, moved laterally across environments, and parsed stolen data without waiting for human guidance. Human operators stepped in only at a few decision points.

The campaign targeted about thirty different organizations across technology, finance, chemical manufacturing, and government. Anthropic was able to confirm successful intrusions in a handful of cases. None of this involved custom malware. The attackers wired ordinary penetration testing tools into a custom automation framework and let Claude run everything at machine speed.

How MCP Made This Possible

If you are not familiar with the Model Context Protocol, or MCP, here is the simplest way to understand it. MCP is a universal adapter that lets an AI system talk to outside tools. It does not perform exploitation on its own. What it does is give the model access to whatever you connect to it, including databases, browsers, file systems, code execution tools, APIs, and command line utilities. Instead of being trapped in a text box, the model can pull real context from the outside world and use real tools in structured ways. It can execute code, read and write files, call APIs, run scripts, and maintain state across long workflows2

That capability is what made this attack possible. MCP was not the exploit mechanism. It was the interface layer that allowed Claude to orchestrate standard penetration testing tools at machine speed.

One detail in the report is easy to overlook but impossible to ignore. The operators kept Claude engaged by pretending to be legitimate security staff and breaking the intrusion into small, harmless looking tasks. Each step looked like routine defensive testing, and Claude was never given enough context to see the bigger picture. That framing kept the workflow running long enough to matter. And the only reason the public knows about this campaign at all is because Anthropic chose to disclose it.

When an AI System Became the Operator

With that framing in mind, what Anthropic documented becomes clearer. The attackers did not simply speed up traditional hacking. They placed a foreign workflow inside an AI system and let the model run it. That is the part that changes everything.

For years, people described AI assisted attacks as if the model would always stay in the role of a clever helper. But that is not what Anthropic found. The attackers built their own automation framework around Claude. They supplied the tools. MCP tied everything together. The penetration testing tools did the scanning, payload execution, credential testing, and exploitation. Claude orchestrated all of it.

And that matters. Because the attackers did not need custom malware or rare expertise. They used basic open source utilities, routed them through MCP, and let the model handle sequencing and decision making. According to Anthropic’s analysis, Claude performed about eighty to ninety percent of the tactical work. Humans stepped in only to approve escalation.

This is not sci fi. It is not emergent malice. It is a system doing exactly what it believes it is supposed to do in an environment unprepared to understand what that actually means.

The Cuckoo’s Egg Problem, All Over Again

What Stoll uncovered in the 1980s was not just a hacker. It was a system-level failure of recognition. The threat looked ordinary, so the environment treated it as ordinary. No one had the monitoring, norms, or mental models to detect an intruder who moved in slow, incremental steps across a landscape no one was watching.

That same pattern is showing up again, but the substrate has changed. Instead of a human slipping quietly through poorly monitored networks, a foreign workflow was placed inside an AI system that had no way to recognize it as a threat. The model accepted each step as benign because the environment around it had not been designed to distinguish legitimate defensive work from a staged intrusion.

The parallels deepen from there. In Stoll’s era, the barrier to intrusion was low because networks were unprepared. Today, the barrier is low because automated orchestration allows an attacker to run dozens of intrusion paths in parallel. The model performs eighty to ninety percent of the tactical work, and because defenders are still operating with human paced assumptions, even modest actors can generate activity that looks far more sophisticated than it is.

And then there is tempo. Stoll spent months following his intruder one terminal at a time. Anthropic watched an AI system execute similar operational phases in hours. Automation collapses the timeline, and when the environment cannot see the threat or react at machine speed, the intrusion has already happened before the alarm even exists.

This is the updated Cuckoo’s Egg problem. A system that cannot recognize what has been placed inside it cannot defend against it. And right now, our models, our infrastructure, and our governance frameworks all share that blind spot.

Why Voluntary AI Governance Cannot Handle This

This is the point where the story stops being about one intrusion and starts being about the systems we keep pretending are enough. If an AI system can carry out most of a cyberattack on its own, then voluntary governance frameworks are already outdated. They were built for a world where humans stayed in control and progress moved slowly enough for norms, committees, and best practices to keep pace. None of that matches what Anthropic documented.

Voluntary standards rely on a fragile assumption: that companies will do the right thing when it matters. That labs will detect misuse and choose to disclose it. That models will behave predictably. That restraint is governance. It is not.

In a piece earlier this year I talked about what global food safety regulation can teach us about AI governance. This is exactly why the food safety analogy matters. We do not only rely on meat processors to voluntarily report contamination. We do not assume refrigeration always works without inspection. We do not expect companies to self certify compliance. We require oversight bodies, mandatory inspections, traceability systems, and independent certification because the risks are uneven and the public pays the price when something goes wrong.

AI needs the same thing. Certified audits. Mandatory reporting. External toolchain inspections. Enforced constraints around model access and misuse detection. Without that, the only reason we hear about a major incident is because one company decides transparency is worth the risk.

Anthropic’s disclosure deserves credit, but it exposes the gap. The attackers built the framework. They used MCP to connect Claude to ordinary tools. They convinced the model it was doing defensive work. None of those steps violate any voluntary guideline. None trigger required disclosure. If it happened on Gemini or ChatGPT would we know? How about Grok?

Where Does GDPR Fit Into This?

Since not everyone follows privacy law, here is the quick baseline. Modern data protection regulations led by the European Union’s General Data Protection Regulation (GDPR), but including California’s CCPA and similar frameworks3 worldwide give people rights over their personal data and put strict obligations on any organization that collects, stores, or processes that data. These laws cover consent, data minimization, security safeguards, breach notification, and the right to access or delete personal information.

GDPR is particularly relevant here because of its extraterritorial reach. It applies not just to EU companies, but to any organization worldwide that processes EU residents’ personal data, which includes most major technology companies, financial institutions, and multinational corporations. When Chinese state-sponsored hackers targeted roughly 30 global organizations across multiple countries, many of those targets almost certainly fell under GDPR jurisdiction.

But GDPR and similar laws assume a specific threat model. They assume breaches happen because a company mishandled data, a human actor misused information, or a system failed in predictable ways. They assume clear responsibility lines between controller and processor. They assume investigations happen in human time. And they assume systems accessing data have stable, understandable behavior.

None of that aligns with the scenario Anthropic documented.

These regulations were not designed for AI systems that can be manipulated into scanning networks, exploiting vulnerabilities, harvesting credentials, and extracting data at machine speed. They were not built to regulate automation operating across dozens of targets in parallel, where most of the intrusion is carried out by a model following an attacker’s workflow.

To be fair, GDPR and similar laws do include significant proactive requirements. Organizations must conduct Data Protection Impact Assessments before high-risk processing begins. They must implement “data protection by design and by default,” building privacy into systems from the start. They must maintain appropriate technical and organizational security measures. These provisions were designed to prevent breaches, not just respond to them.

But the reactive mechanisms remain central to enforcement. Notification comes after the breach. Investigation comes after the breach. Penalties come after the breach. When the breach itself happens faster than human investigation cycles, when an AI system can compromise dozens of targets in the time it takes to file a 72-hour notification, the protection arrives too late. And the proactive measures, while important, were calibrated for threats that operate at human speed and follow predictable patterns.

This is not a failure of GDPR or other data protection laws. It is a mismatch of eras. These regulations govern the data practices of the mid-2010s. They are being asked to govern the capabilities of the mid-2020s.

The Window Is Closing

The uncomfortable truth is that we are already behind. Anthropic’s report is not a warning about a future scenario. It is documentation of what has already happened. The capabilities exist. The tooling exists. The automation frameworks exist. And the attack surface is expanding faster than any governance model we currently have.

There is a window here, but it is not a wide one. Right now, most misuse still depends on attackers building wrappers, scripting workflows, and convincing a model that malicious tasks are legitimate. That will not remain true. As models become more capable and agent frameworks become more polished, the amount of custom engineering required to run large scale operations will drop. The barrier will not be expertise. It will be intent.

In the early internet, the biggest advantage an intruder had was that the environment was unprepared. That is still true. But automation makes the gap multiplicative. An attacker does not need to be brilliant or patient or deeply technical. If the model can reason about it and execute it, the attacker only has to approve the steps.

This is the moment to build the guardrails. Because once automated intrusion becomes widely accessible, once the tools become polished and easy to use, once adapters like MCP or their equivalents are everywhere, the defensive side will not just be behind. It will be permanently outpaced.

The window is closing. And the world is not acting like it knows.

Anthralytic is an independent studio focused on the intersection of social impact strategy, evaluation, and the real world impact of AI.

I highly recommend The Cuckoo’s Egg, by Clifford Stoll. It is a strange mix of nostalgia and real suspense, and its lessons feel even more relevant today than when it was written.

I’m not an expert on MCP so if I’ve gotten any of this wrong, please let me know. This is how MCP works to the best of my understanding.

I’m also not an expert on GDPR or CCPA so if I’ve gotten any of this wrong, please correct me.

Anthralytic’s Substack

Discussion about this post

Ready for more?