Computer-Use

Abstract flat illustration of a compact robot arm precisely clicking a glowing browser window, with terminal output scrolling in the background

How to Run MolmoWeb Locally: Deploy an Open-Source Browser Agent in Under 10 Minutes

MolmoWeb is Ai2’s open-source browser agent — 8B parameters, Apache 2.0, no API key required. It scores 78.2% on WebVoyager and beats GPT-4o-based agents on multiple benchmarks. Here’s how to get it running locally. System requirements: GPU with at least 16GB VRAM (for 8B model) or 8GB VRAM (for 4B model) Ubuntu 20.04+ or macOS 12+ (Linux recommended for GPU support) Python 3.10+ Chrome or Chromium browser installed Step 1: Clone the Repository git clone https://github.com/allenai/molmoweb.git cd molmoweb Step 2: Create a Virtual Environment and Install Dependencies python3 -m venv molmoweb-env source molmoweb-env/bin/activate pip install -r requirements.txt The requirements include PyTorch, the Transformers library, Playwright for browser control, and Pillow for screenshot processing. The full install typically takes 3–5 minutes on a good connection. ...

A small robot navigating a giant floating web of interconnected browser windows, minimal 3D

MolmoWeb: Ai2's Open-Source Web Browser Agent Beats GPT-4o at Just 8 Billion Parameters

The Allen Institute for AI (Ai2) just dropped something the open-source AI community has been waiting for: a fully open, genuinely capable web browser agent that can go head-to-head with GPT-4o-based systems — at 8 billion parameters. It’s called MolmoWeb, and it’s available right now on Hugging Face under Apache 2.0. What MolmoWeb Actually Does MolmoWeb is a multimodal web agent. You give it a natural-language instruction, and it autonomously controls a real web browser: clicking, typing, scrolling, navigating, filling forms. It understands the web visually — through screenshots — rather than through structured DOM parsing. ...

A glowing robotic arm hovering above a Windows-style desktop grid, reaching toward abstract application icons

Claude Cowork and Claude Code Now Control Your Desktop on Windows — Computer Use Expands

One week after debuting on Mac, Claude’s computer use research preview has arrived on Windows — and it’s already baked directly into two of Anthropic’s flagship tools. On April 2-3, 2026, Anthropic expanded Claude Cowork and Claude Code to support direct desktop control on Windows machines. The official @claudeai Twitter account confirmed the rollout on April 2, and Anthropic’s help center now documents the feature for both platforms. What Computer Use Actually Does Claude’s computer use capability allows the AI to operate your desktop directly — moving the cursor, clicking buttons, typing text, launching applications — as a fallback mechanism when standard integrations aren’t available. Think of it as a universal adapter: if there’s no API or plugin for a task, Claude can just do what a human user would do. ...

Two abstract upward-trending bars side by side, one glowing orange and one glowing blue, rising through a clean dark gradient field

Anthropic's Claude Subscriptions Are Quietly Doubling — Gaining Ground on OpenAI

Anthropic’s Claude has been quietly staging one of the more impressive subscription growth stories in AI. According to TechCrunch reporting, Claude’s paying consumer subscriber base has doubled in recent months — with estimates putting total users somewhere between 18 million and 30 million. The growth isn’t random. It’s driven by two specific capabilities that users are actually paying for: computer use and persistent memory. What’s Driving the Surge Computer use — Claude’s ability to control a desktop environment, browse the web, operate applications, and complete multi-step tasks autonomously — is the headline agentic feature. It’s genuinely different from what competitors offer at a consumer subscription tier. ChatGPT can help you write and search; Claude can actually click around your computer and do the work. ...

An abstract mechanical claw arm reaching toward a glowing laptop screen, rendered in flat vector style with blue and white tones

Anthropic Launches Claude Cowork: Computer-Use Agent for Mac and Windows Now in Research Preview

Anthropic just made it official: Claude can now use your computer. The company announced today that Claude Cowork — its research preview for desktop computer-use — is now available to Claude Pro and Claude Max subscribers on macOS, with Windows support coming. This isn’t a software integration or a plugin. Claude can now point, click, scroll, open files, navigate your browser, and run developer tools on your actual machine — acting like a remote operator who happens to live inside your subscription plan. ...

A robotic arm reaching toward an illuminated laptop screen in a minimal, dark workspace

Anthropic Launches Claude Cowork: Computer-Use Agent for Mac and Windows Now in Research Preview

Anthropic just crossed a threshold that a lot of AI observers have been waiting for: its Claude AI can now directly control your computer. The company announced today that Claude Cowork — along with Claude Code — is being updated to perform tasks using your Mac or PC, opening files, running browser sessions, and executing multi-step workflows without you having to hold its hand at every step. This is no small shift. Computer-use AI has been a proof-of-concept for a while, but Anthropic is now putting it into the hands of paying subscribers. If you’re on Claude Pro or Claude Max, you can activate this on macOS today. ...

An abstract floating phone screen with glowing AI connection lines radiating outward to app icons

Gemini Screen Automation Rolls Out for Galaxy S26 — AI Agents Now Control Android Apps

The agentic AI revolution has officially reached your pocket. Google’s Gemini “screen automation” — an agentic task feature that lets your AI assistant actually operate Android apps on your behalf — has begun rolling out to Samsung Galaxy S26 users, with a Pixel 10 expansion planned. This isn’t a gimmick. It’s a meaningful step toward AI agents becoming the primary way we interact with our phones. What Gemini Screen Automation Does The feature is exactly what it sounds like: Gemini takes control of Android apps and navigates them to complete tasks you describe in plain language. ...

A robotic eye watching a glowing monitor screen with circuit-board tendrils reaching outward

Elon Musk Unveils 'Macrohard' (Digital Optimus): Tesla-xAI AI Agent That Watches Your Screen and Can Clone Entire Software Companies

Elon Musk has unveiled Macrohard — also being called “Digital Optimus” — a joint project between Tesla and xAI that could be the most ambitious computer-use AI agent announced to date. The combination is exactly what you’d expect from Musk: audacious framing (“emulate the function of entire software companies”), paired with a technically interesting architecture that actually warrants the headline. What Macrohard Actually Is The core system pairs Grok LLM reasoning with a Tesla-built computer-use agent that watches a continuous stream of screen video. Specifically, the agent processes the last five seconds of screen activity and responds with real-time keyboard and mouse actions — essentially a “see what you see, do what you’d do” loop operating at video-frame speeds. ...

A glowing neural network web stretching across a vast dark digital landscape, with a single central node radiating outward connections

OpenAI Launches GPT-5.4 With Native Computer-Use Capabilities and 1M Token Context

The agentic AI landscape just shifted. OpenAI’s GPT-5.4 — launched March 5, 2026 — isn’t just a model update. It’s a direct bid to own the autonomous agent stack, arriving with native computer-use, a one-million-token context window, and a reworked tool-calling system that slashes token consumption by 47% on MCP benchmark tasks. If you’re building with agent pipelines, this is the model release worth paying attention to. What’s Actually New in GPT-5.4 Native Computer-Use This is the headline feature, and it’s genuinely significant. Rather than bolting computer-use on as a post-hoc capability, OpenAI has built it into GPT-5.4 at the architecture level. The model can observe screen states, click UI elements, type into fields, scroll, and navigate applications — autonomously, without requiring a separate vision model or operator middleware. ...

OpenAI Launches GPT-5.4 with Native Computer Use and 1M Token Context Window

OpenAI dropped a significant update on March 5, 2026: GPT-5.4, a model built from the ground up for autonomous agent work. It ships with two things practitioners have been waiting for — native computer-use capabilities and a 1M-token context window in API preview. If you build agents, this changes your architecture options in real ways. What Actually Shipped GPT-5.4 comes in two variants: Standard GPT-5.4 — The default API model with native computer-use support and 1M-token context GPT-5.4 Pro — A higher-performance tier aimed at complex, long-horizon tasks The model is available in ChatGPT, the Codex environment, and the API. Microsoft Foundry integration is also confirmed, meaning enterprise teams using Azure AI Foundry can access it without a separate onboarding. ...