Recent high-profile hacks, including the Gemini Calendar prompt-injection attack of 2026 and the September 2025 state-sponsored hack using Anthropic’s Claude code, have highlighted the coercion of human-in-the-loop agentic actions and fully autonomous agentic workflows as a new attack vector for hackers.
In the Anthropic case, approximately 30 organizations across tech, finance, manufacturing, and government were affected, with Anthropic’s threat team assessing that AI was used to carry out 80% to 90% of the operation, including reconnaissance, exploit development, credential harvesting, lateral movement, and data exfiltration.
The attackers hijacked an agentic setup and jailbroke it by decomposing the attack into small, seemingly benign tasks, persuading the model that it was conducting legitimate penetration testing. This highlights the need for robust security measures to prevent prompt injection and agent goal hijack attacks.
Security communities have been warning about the risks of prompt injection and agent goal hijack for several years, with multiple OWASP Top 10 reports highlighting the risks of identity and privilege abuse and human-agent trust exploitation.
Guidance from the NCSC and CISA emphasizes the need for a lifecycle approach to managing the risks associated with generative AI, including design, development, deployment, and operations. The EU AI Act has turned this into law for high-risk AI systems, requiring continuous risk management, robust data governance, logging, and cybersecurity controls.