Over the past weekend, the US AI lab Anthropic published a report about its discovery of the “first reported AI-orchestrated cyber espionage campaign”.
The company says a Chinese government–sponsored hacking group used Anthropic’s own Claude AI tool to automate a significant part of an effort to steal sensitive information from around 30 organisations.
The report has drawn a lot of attention. Some, including respected experts, have warned that AI-automated cyber attacks are the future, urging cyber defenders to invest now before the coming onslaught.
At the same time, many in the cyber security industry have been underwhelmed by Anthropic’s claims, saying the actual role AI played in the attacks is unclear.
What Anthropic says happened
Critics have pointed out what they say is a lack of detail in the report, which means we have to do a certain amount of guesswork to try to piece together what might have happened. With that in mind, it appears the hackers built a framework for carrying out cyber intrusion campaigns mostly automatically.
The grunt work was carried by Anthropic’s Claude Code AI coding agent. Claude Code is designed to automate computer programming tasks, but it can also be used to automate other computer activities.
Claude Code has built-in safety guardrails to prevent it from causing harm. For example, I asked it just now to write me a program that I could use to carry out hacking activities. It bluntly refused.
However, as we have known from the very first days of ChatGPT, one way to bypass guardrails in AI systems is to trick them into engaging in role-play.
Anthropic reports that this is what these hackers did. They tricked Claude Code into believing it was assisting authorised hackers to test the quality of a system’s defences.
Missing details
The information Anthropic has published lacks the fine details that the best cyber incident investigation reports tend to include.
Chief among these are so-called indicators of compromise (or IoCs). When investigators publish a report into a cyber intrusion, they usually include hard evidence that other cyber defenders can use to look for signs of the same attack.
Each attack campaign might use specific attack tools, or might be carried out from specific computers under the attacker’s control. Each of these indicators would form part of the cyber intrusion’s signature.
Somebody else who gets attacked using the same tools, coming from the same attacking computers, can infer that they have also been a victim of this same campaign.
For example, the US government Cybersecurity and Infrastructure Security Agency recently partnered with government cyber agencies worldwide to publish information about ongoing Chinese state-sponsored cyber espionage, including detailed indicators of compromise.
Unfortunately, Anthropic’s report includes no such indicators. As a result, defenders are unable to determine whether they might also have been victims of this AI-powered hacking campaign.
Unsurprising – and with limited success
Another reason many have been underwhelmed by Anthropic’s claims is that, on their face and absent hard details, they are not especially surprising.
Claude Code is widely used by many programmers because it helps them to be more productive.
While not exactly the same as programming tasks, many common tasks performed during a cyber intrusion are similar enough to programming tasks that Claude Code should be able to carry them out, too.
A final reason to be wary of Anthropic’s claims is that they suggest the attackers might have been able to get Claude Code to perform these tasks more reliably than it typically does so.
Generative AI can perform marvellous feats. But getting systems such as ChatGPT or Claude Code to do so reliably remains a major challenge.
In the memorable words of one commentator, too often these tools respond to difficult requests with “ass-kissing, stonewalling, and acid trips”. In plainer language, AI tools are prone to sycophancy, repeated refusal to carry out difficult tasks, and hallucinations.
Indeed, Anthropic’s report notes that Claude Code frequently lied to the attackers, pretending it had carried out a task successfully even when it hadn’t. This is a classic case of AI hallucination.
Perhaps this explain the attack’s low success rate: Anthropic’s own reporting says that while about 30 organisations were targeted, the hackers succeeded against only a few.
What does this mean for the future of cyber security and AI?
Whatever the details of this particular campaign, AI-enabled cyber attacks are here to stay.
Even if one contends that current AI-enabled hacking is lame, it would be foolish for cyber defenders to assume it will stay that way.
If nothing else, Anthropic’s report is a timely reminder for organisations to invest in cyber security. Those who do not may face a future in which their secrets are stolen or operations disrupted by autonomous AI agents.
The post “An AI lab says Chinese-backed bots are running cyber espionage attacks. Experts have questions” by Toby Murray, Professor of Cybersecurity, School of Computing and Information Systems, The University of Melbourne was published on 11/17/2025 by theconversation.com




















