Evaluation

The gap between human and machine intelligence just got a new measuring stick — and the results are humbling for AI. On March 25, 2026, ARC Prize officially launched ARC-AGI-3, the third generation of the Abstraction and Reasoning Corpus benchmark series. Where previous editions measured pattern recognition and abstract reasoning on static puzzles, ARC-AGI-3 introduces something fundamentally different: interactive, turn-based environments designed to measure genuine agentic intelligence. The headline numbers? Humans score 100%. Frontier AI — including the best available large language models — scores just 0.26%. ...

Evaluation

ARC-AGI-3 Launches: Interactive Benchmark Tests Agentic Intelligence Through Turn-Based Environments

Claude Opus 4.6 Can Detect When It's Being Evaluated — OpenClaw Creator Calls It 'Scary'