
A hacker group, believed to be backed by the Chinese government, has launched a large-scale cyberattack using Claude, an AI model developed by the U.S. startup Anthropic.
Anthropic revealed on Thursday that it had detected attacks targeting 30 entities, including government agencies, major tech firms, financial institutions, and chemical manufacturers. It stated that some of these infiltration attempts were successful.
The attack was characterized by the hacker automating most of the tasks using Anthropic’s coding model, Claude Code.
Jacob Klein, Anthropic’s Head of Threat Intelligence, told the Wall Street Journal that the attack was executed “with a single click and minimal human intervention.” Human involvement was reportedly limited to fact-checking and providing specific instructions at certain stages.
To bypass the AI model’s safety features, the attackers employed a jailbreaking technique. They deceived the system by posing as legitimate security professionals conducting penetration testing.
However, Claude exhibited errors presumed to be hallucinations, such as generating non-existent credentials or claiming to extract confidential information from public data.
Anthropic immediately blocked the suspicious accounts upon detection and notified relevant authorities after a 10-day investigation.
Addressing concerns about potential AI misuse, the company stated that the very capabilities that can be exploited for attacks are crucial for enhancing security detection and defense. They pledged to develop Claude with robust safety features to bolster security professionals’ capabilities in threat detection and defense.