Prompt-Injection

Five interlocking shield segments arranged around a central glowing node, abstract geometric style on dark background

Anthropic Publishes 'Trustworthy Agents in Practice' — Five-Principle Safety Framework for Autonomous Claude Agents

Anthropic dropped something significant yesterday: a full research framework titled Trustworthy Agents in Practice, published alongside the launch of Claude Managed Agents. It’s the clearest public articulation yet of how Anthropic thinks about safe, autonomous AI agent deployment — and it directly addresses the two biggest failure modes the industry is grappling with right now. Why This Matters Now AI agents are no longer prototype toys. Claude Code, Claude Cowork, and a growing ecosystem of third-party deployments are completing multi-step tasks — writing and running code, managing files, browsing the web, interacting with APIs — with minimal human supervision. That autonomy is the whole point. It’s also precisely where things can go wrong. ...

How to Apply Anthropic's 5 Trustworthy Agent Principles to Your OpenClaw Setup

Anthropic published its Trustworthy Agents in Practice framework yesterday — a five-principle safety baseline for autonomous Claude agents. The principles are solid, but they’re abstract. This guide translates each one into concrete configuration and design choices you can make in OpenClaw today. The Five Principles (Quick Summary) Before the how-to: Anthropic’s framework names five principles for trustworthy agent operation: Human control — Maintain meaningful oversight; prefer reversible actions Alignment with user expectations — Act on intent, not just literal instruction Security — Resist prompt injection and adversarial inputs Transparency — Be honest about capabilities, limitations, and actions taken Privacy — Operate with minimum necessary access to data Each maps to specific choices in how you configure and constrain your agents. ...

A geometric spider web with glowing trap nodes at intersections, dark vectors converging on a central luminous AI core, abstract and ominous

Google DeepMind Maps 6 'AI Agent Trap' Categories — Content Injection Hijacks Succeed in 86% of Tests

If you’re building autonomous AI agents — and especially if you’re deploying them to browse the web, process emails, or interact with external data — a new Google DeepMind paper deserves your immediate attention. The research maps the first systematic framework for what the authors call “AI Agent Traps”: adversarial techniques embedded in the environment that exploit the gap between human perception and machine parsing. The headline number is alarming: content injection hijacks succeeded in up to 86% of tested scenarios. And in tests targeting Microsoft M365 Copilot specifically, behavioral control traps achieved a perfect 10/10 data exfiltration rate. ...

How to Harden Your AI Agent Against the 6 Google DeepMind Agent Trap Categories

Google DeepMind’s new research framework maps six categories of “AI Agent Traps” — adversarial techniques embedded in the environment that can hijack autonomous agents without the user or the agent knowing. With content injection attacks succeeding in up to 86% of tested scenarios, this isn’t theoretical risk. This guide walks through each of the six trap categories and gives you concrete, actionable mitigations you can implement today — whether you’re running OpenClaw, a custom LangGraph pipeline, or any other agent framework. ...

Claude Code Silently Ignores Your Deny Rules After 50 Subcommands

There’s a rule in computer security called Kerckhoffs’s Principle: a system must remain secure even if everything about it is public knowledge. Anthropic, a company that has staked its entire identity on being “safety first,” just shipped a product that violates that principle in a way that’s almost poetic in its mundaneness. Not through a zero-day exploit or a sophisticated attack chain. Through a performance shortcut. What Actually Happens Claude Code lets operators and users configure deny rules — a list of commands the agent is never allowed to run. You can say “never execute rm,” “never run curl,” “never touch /etc/.” It’s the primary mechanism for keeping an AI agent that has shell access to your machine from doing something catastrophic. ...

Cracked containment barrier with code fragments escaping through fractures, red warning tones on dark background

CrewAI Critical Vulnerabilities Enable Sandbox Escape and Host Compromise via Prompt Injection

Security researcher Yarden Porat at Cyata published findings this week that should be required reading for anyone running CrewAI in production: four critical CVEs, chainable via prompt injection, that allow attackers to escape Docker sandboxes and execute arbitrary code on the host machine. CERT/CC issued advisory VU#221883. Patches are available. What Was Found Porat’s research identified four vulnerabilities in CrewAI that can be chained together: CVE-2026-2275 — The initial vector: a prompt injection flaw that allows malicious content in agent inputs to manipulate how CrewAI processes tool calls. Normally, tool calls are structured, validated operations. This CVE allows crafted input to make the framework treat attacker-controlled content as legitimate tool invocations. ...

Invisible streams of data packets flowing out through a DNS lookup tunnel while a chat interface shows no visible activity

ChatGPT DNS Data Exfiltration Flaw Fixed: Check Point's Full Disclosure of Silent Prompt Injection Attack

A carefully crafted malicious prompt could turn an ordinary ChatGPT conversation into a covert data exfiltration channel — silently leaking your messages, uploaded files, and AI-generated summaries without any warning. Check Point Research published full technical details on March 31, 2026 of a vulnerability that OpenAI patched on February 20, 2026. The Architecture of a Silent Exfiltration ChatGPT runs code in a sandboxed Linux environment with outbound web controls designed to prevent unauthorized data sharing. The controls block direct HTTP/HTTPS requests — but the researchers discovered a critical gap: DNS lookups were not subject to the same outbound restrictions. ...

A glowing shield with circuit patterns deflecting abstract attack vectors in deep blue and gold

OpenAI Launches Safety Bug Bounty for Agentic Risks — Up to $100K for Prompt Injection, Platform Integrity Flaws

OpenAI has launched its first public Safety Bug Bounty program — and it’s squarely focused on the attack surfaces that matter most for agentic AI: prompt injection, MCP-based hijacks, data exfiltration from ChatGPT Agent, and platform integrity flaws. Top reward: $100,000 for critical safety vulnerabilities. This isn’t a standard security bounty. It’s specifically designed to capture the class of AI-native risks that traditional vulnerability disclosure programs aren’t built for — the kind of things that don’t show up in CVE databases but can cause real harm at scale when AI agents are acting in the world. ...

Abstract dark pipeline with glowing orange fracture points along its length, representing attack vectors introduced into a software supply chain by autonomous coding agents

Coding Agents Are Widening Your Software Supply Chain Attack Surface

The software supply chain attack models your security team has been defending against for the past decade assumed one thing: the entities making decisions inside your build pipeline were humans. Slow, reviewable, occasionally careless humans — but humans. Coding agents like Claude Code, Cursor, and GitHub Copilot Workspace have changed that assumption. They are autonomous participants in the software development lifecycle: generating code, selecting dependencies, executing build steps, and pushing changes at machine speed. The attack surface they introduce is the natural consequence of giving a privileged, autonomous system access to an environment where a single bad decision can propagate into production before any human review process catches it. ...

A glowing claw icon rising over a stylized skyline silhouette, with warning triangles scattered around its base representing cybersecurity alerts

OpenClaw Goes Viral in China: Tencent Scale-Ups, Alibaba Launch, and Back-to-Back Government Cybersecurity Warnings

OpenClaw’s rise in China has taken a new turn: what started as a viral cultural phenomenon has crossed into serious enterprise territory — and that’s prompted China’s government to respond with back-to-back cybersecurity warnings unlike anything it has issued about a single open-source project before. The dual nature of this story — explosive adoption and urgent official concern — captures exactly the tension that agentic AI creates at scale. The Adoption Wave The numbers from China’s tech sector are striking. Tencent Cloud is running on-site installation sessions for enterprise clients, helping businesses deploy OpenClaw at scale. Alibaba has launched a dedicated OpenClaw application — not just compatibility, but a purpose-built product built on the framework. ...