Z.AI Releases GLM-5.1: Open-Weight 754B Agentic Model Beats GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro, Sustains 8-Hour Autonomous Execution
The benchmark war just shifted terrain. Z.AI — the Chinese AI startup behind the GLM family — released GLM-5.1 today under an MIT license, and the numbers are hard to ignore: 58.4 on SWE-Bench Pro, edging past GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). But the more interesting story isn’t the benchmark score. It’s the philosophy behind how Z.AI got there. Not About Reasoning Tokens — About Autonomous Work Time While most frontier labs have been chasing better logic through more reasoning tokens, Z.AI is optimizing for something different: productive horizons. How long can an agent work autonomously on a single task without going off the rails? ...