Tag: Agent Security

  • Claude containment shows why agent security starts outside the model

    Claude containment shows why agent security starts outside the model

    Claude containment is Anthropic’s answer to a problem every serious AI agent product now faces: a safer model still needs a smaller place to fail. In a May 25, 2026 engineering post, Anthropic described how it limits Claude across claude.ai, Claude Code, and Claude Cowork with containers, OS sandboxes, virtual machines, and egress controls.

    The short version

    • Anthropic says Claude Code users approved roughly 93% of permission prompts, which made human approval a weak default for high-frequency agent work.
    • Claude Code’s OS-level sandbox uses Seatbelt on macOS and bubblewrap on Linux, with network access denied by default and writes limited to the workspace.
    • Claude Cowork uses a local VM and mounted workspaces because its target user is less likely to judge shell commands or agent drift safely.
    • The most useful security lesson is about egress: an allowlisted domain can still become a data exfiltration path if it exposes the wrong capability.

    What happened

    Anthropic published a detailed engineering write-up on how it contains Claude across three product surfaces: claude.ai, Claude Code, and Claude Cowork. The post frames agent risk as two variables: the chance of failure and the damage a failure can cause. Model training and classifiers can reduce the first. Runtime isolation, file-system boundaries, scoped credentials, and network controls reduce the second.

    The company gives concrete numbers. Claude Code previously asked users to approve write, shell, and network actions, but Anthropic’s telemetry showed users approved about 93% of prompts. After adding an OS sandbox, the company says permission prompts fell by 84%. Anthropic also says Claude Opus 4.7 held prompt-injection attack success to about 0.1% on single attempts in Gray Swan’s Agent Red Teaming benchmark, rising to roughly 5-6% after 100 adaptive attempts. Those numbers are useful, but the post’s strongest point is simpler: probabilistic model defenses do not replace deterministic boundaries.

    For more AI infrastructure coverage, the IT & AI archive is the best place to track related posts.

    Why Claude containment is worth watching

    Claude containment is worth watching because Anthropic is treating agent security as product architecture, not a final policy layer. claude.ai runs code in gVisor containers on Anthropic-managed infrastructure, with ephemeral per-session file systems. Claude Code runs on a developer’s machine, so it uses a lower-friction OS sandbox. Claude Cowork, aimed at general knowledge work, uses a local VM with selected folders mounted into it.

    That product-by-product split matters. A coding agent, a web chat tool, and an enterprise coworker do not deserve the same security default. Developers may accept more control and more visible risk. A business user asking an agent to work with documents and calendars needs stronger defaults, fewer scary prompts, and less access to the host machine. The model can be the same; the containment cannot.

    What does Claude containment change for builders?

    Claude containment gives builders a practical checklist: decide what the agent can read, where it can write, which credentials it can see, and which domains it can reach before arguing about model behavior. If the agent never receives a production token, it cannot leak that token. If network access is off by default, prompt injection has fewer places to send stolen data. If workspaces are mounted narrowly, a compromised task sees less of the user’s machine.

    The harder part is product fit. A sandbox that blocks every useful action will push users into unsafe bypasses. A permission dialog that appears constantly teaches people to click yes. Anthropic’s 93% approval rate is a warning for any team relying on human review as the main guardrail. Good agent UX should remove routine approvals by narrowing the environment, then reserve human attention for decisions that actually need judgment.

    What Hacker News readers are arguing about

    The Hacker News discussion was less impressed by the broad principle than by the messy edge cases. Several commenters argued that environment-layer containment is obvious but still underused: give agents less scope, fewer credentials, and a smaller working directory. Others shared their own setups using QEMU VMs, bubblewrap, read-only mounts, repo-scoped GitHub tokens, or separate profiles for internet access and local file access.

    The strongest technical concern was data exfiltration. One thread focused on egress controls and whether allowlisted domains can still leak data through domain fronting, API capabilities, or artifact handoff between isolated and privileged contexts. A few commenters also questioned Anthropic’s own framing, arguing that the company has an incentive to make its systems sound powerful and dangerous. The useful middle position is that containment is the right design direction, but nobody should treat a vendor blog post as proof that the hard parts are solved.

    The practical read

    Teams building agent products should start with a threat model for the runtime, not a slogan about safer AI. List the files, credentials, tools, outbound domains, and write paths the agent can touch. Then remove access until the product breaks, and add back only the pieces needed for the task.

    For coding tools, that may mean workspace-only writes, network denial by default, repo-scoped tokens, and a review step before pushing code. For enterprise assistants, it may mean VMs, mounted folders, per-session tokens, and proxy logic that checks both destination and credential provenance. For MCP connectors and plugins, it means treating returned content as untrusted input even when the connector itself has passed review.

    The lesson from Anthropic’s post is not that Claude is uniquely risky. It is that useful agents turn ordinary product surfaces into execution environments. Once an assistant can run commands, read documents, and call APIs, security belongs in the container, the VM, the proxy, and the credential design before it belongs in the prompt.

    Sources