Safety

Five interlocking shield segments arranged around a central glowing node, abstract geometric style on dark background

Anthropic Publishes 'Trustworthy Agents in Practice' — Five-Principle Safety Framework for Autonomous Claude Agents

Anthropic dropped something significant yesterday: a full research framework titled Trustworthy Agents in Practice, published alongside the launch of Claude Managed Agents. It’s the clearest public articulation yet of how Anthropic thinks about safe, autonomous AI agent deployment — and it directly addresses the two biggest failure modes the industry is grappling with right now. Why This Matters Now AI agents are no longer prototype toys. Claude Code, Claude Cowork, and a growing ecosystem of third-party deployments are completing multi-step tasks — writing and running code, managing files, browsing the web, interacting with APIs — with minimal human supervision. That autonomy is the whole point. It’s also precisely where things can go wrong. ...

How to Apply Anthropic's 5 Trustworthy Agent Principles to Your OpenClaw Setup

Anthropic published its Trustworthy Agents in Practice framework yesterday — a five-principle safety baseline for autonomous Claude agents. The principles are solid, but they’re abstract. This guide translates each one into concrete configuration and design choices you can make in OpenClaw today. The Five Principles (Quick Summary) Before the how-to: Anthropic’s framework names five principles for trustworthy agent operation: Human control — Maintain meaningful oversight; prefer reversible actions Alignment with user expectations — Act on intent, not just literal instruction Security — Resist prompt injection and adversarial inputs Transparency — Be honest about capabilities, limitations, and actions taken Privacy — Operate with minimum necessary access to data Each maps to specific choices in how you configure and constrain your agents. ...

How to Use Gemini CLI Plan Mode for Safer Agentic Coding

One of the most persistent anxieties in agentic coding is the “what is this thing about to do to my repo?” problem. You describe a task. The agent starts executing. And somewhere between your request and the outcome, files get modified, commands get run, and irreversible things happen — sometimes incorrectly. Google just shipped a thoughtful solution to this problem in Gemini CLI: plan mode. Plan mode restricts the AI agent to read-only tools until you explicitly approve its proposed plan. No file writes. No command execution. Just analysis and a detailed proposal — which you review, approve (or reject), and then execute with confidence. ...

Abstract cascade of interconnected glowing red nodes destabilizing in sequence against a dark grid background

AI Agents of Chaos: New Research Reveals How Bots Talking to Bots Creates Catastrophic Failure Modes

There’s a problem with multi-agent AI systems that doesn’t show up until you run them in the wild, and a new research paper from Northeastern University has done the work of naming it precisely. The paper, “Agents of Chaos,” led by researcher Natalie Shapira, makes a claim that anyone who’s run multi-agent pipelines in production will recognize: the failure modes of two agents interacting are not the sum of their individual failures. They’re something qualitatively different and qualitatively worse. ...

How to Configure Multilingual Stop Phrases in OpenClaw v2026.2.24

How to Configure Multilingual Stop Phrases in OpenClaw v2026.2.24 OpenClaw v2026.2.24 ships with a feature that addresses a real gap in agentic safety: multilingual stop phrases. Where previously the emergency abort system primarily recognized English keywords, it now understands stop commands in nine languages — Spanish, French, Chinese, Hindi, Arabic, Japanese, German, Portuguese, and Russian. This how-to walks you through: What changed in the stop phrase system How the defaults work (and what you get for free) How to customize stop phrases if the defaults don’t fit your setup First look at the new Android 5-tab shell Why This Matters If you’ve followed the Summer Yue inbox incident, you already understand the stakes. When an agent is doing something harmful, your ability to stop it quickly matters. Previous versions of OpenClaw’s stop system had an English-centric blind spot: users who naturally reached for their native language in a panic moment were not well served. ...