Anthropic Shares Key Lessons on Securing Claude Agents After Security Incidents

Anthropic has released a detailed report on its strategies for containing Claude agents, highlighting the importance of robust environmental containment after experiencing two significant security incidents.

A key finding from these incidents is that while model-layer defenses are crucial, they are not foolproof and will always have a small chance of failure. As a result, Anthropic employs hard environmental containment measures in addition to safer models to ensure the security of its systems.

The company uses three main containment patterns across its platforms: ephemeral gVisor containers for Claude.ai, OS-level sandboxing with human oversight for Claude Code, and full local virtual machines for Cowork.

Two notable security incidents were reported, including a successful phishing attack by a red team that managed to exfiltrate AWS credentials 24 out of 25 times, and the discovery of a vulnerability in Cowork’s egress allowlist by a third party, which allowed an attacker to upload files to their Anthropic account.

These incidents underscore the critical role of egress controls and the need to consider all functions accessible through allowed domains as potential vulnerabilities.

The report also explores issues like persistent memory poisoning and multi-agent trust escalation, offering valuable insights for developers working on agentic systems.

Photo by Pavel Danilyuk on Pexels
Photos provided by Pexels