MolmoWeb is Ai2’s open-source browser agent — 8B parameters, Apache 2.0, no API key required. It scores 78.2% on WebVoyager and beats GPT-4o-based agents on multiple benchmarks. Here’s how to get it running locally.

System requirements:

  • GPU with at least 16GB VRAM (for 8B model) or 8GB VRAM (for 4B model)
  • Ubuntu 20.04+ or macOS 12+ (Linux recommended for GPU support)
  • Python 3.10+
  • Chrome or Chromium browser installed

Step 1: Clone the Repository

git clone https://github.com/allenai/molmoweb.git
cd molmoweb

Step 2: Create a Virtual Environment and Install Dependencies

python3 -m venv molmoweb-env
source molmoweb-env/bin/activate

pip install -r requirements.txt

The requirements include PyTorch, the Transformers library, Playwright for browser control, and Pillow for screenshot processing. The full install typically takes 3–5 minutes on a good connection.

Step 3: Install Playwright Browser Drivers

MolmoWeb uses Playwright to control the browser. After installing Python dependencies:

playwright install chromium
playwright install-deps chromium

This downloads the Playwright-managed Chromium binary and installs system dependencies. On Ubuntu, install-deps may require sudo.

Step 4: Download the Model Weights

Ai2 hosts MolmoWeb weights on Hugging Face. Download via the Hugging Face CLI:

pip install huggingface-hub

# For the 8B model (recommended — better benchmark performance):
huggingface-cli download allenai/MolmoWeb-8B --local-dir ./models/molmoweb-8b

# For the 4B model (lower VRAM requirement):
huggingface-cli download allenai/MolmoWeb-4B --local-dir ./models/molmoweb-4b

The 8B model is approximately 16GB. Download time depends on your connection.

Step 5: Run MolmoWeb on a Task

With the model downloaded and the browser driver installed, you’re ready to run a task:

python run_agent.py \
  --model ./models/molmoweb-8b \
  --task "Go to weather.com and find the current temperature in San Francisco" \
  --headless false

Setting --headless false lets you watch the agent control the browser in real time. For automated runs, set --headless true.

MolmoWeb will launch a Chromium browser, take an initial screenshot, and begin executing actions to complete the task. You’ll see the action log in your terminal as it runs.

Common Configuration Options

Timeout: --timeout 120 — maximum seconds per task (default: 60)

Screenshot interval: --screenshot-interval 2 — how often the agent takes a new screenshot to assess state (default: 2 seconds)

Save traces: --save-traces ./traces/ — saves screenshots and action logs for each run, useful for debugging

Custom start URL: --start-url https://example.com — start the browser at a specific URL instead of the default new tab

Troubleshooting

“CUDA out of memory” on the 8B model: Try the 4B model, or reduce the screenshot resolution with --screenshot-width 1024 (default is 1280).

Playwright browser fails to launch: Run playwright install-deps chromium again, and ensure you’re in the virtual environment where Playwright is installed.

Agent loops on the same page: Increase the timeout with --timeout 180 and check whether the target site has anti-bot measures that are blocking screenshot capture.

Model loads but actions are slow: MolmoWeb is GPU-intensive. If inference is slow, verify PyTorch is using your GPU: python -c "import torch; print(torch.cuda.is_available())". If this returns False, reinstall PyTorch with CUDA support for your CUDA version.

Running the Benchmarks

To reproduce the published benchmark results:

# WebVoyager benchmark
python benchmark.py --benchmark webvoyager --model ./models/molmoweb-8b --output ./results/

# DeepShop benchmark
python benchmark.py --benchmark deepshop --model ./models/molmoweb-8b --output ./results/

Benchmark runs require API credentials for some WebVoyager tasks (the benchmark uses live websites). See benchmarks/README.md in the repository for credential setup.


Related: MolmoWeb release overview · allenai/molmoweb on GitHub