MolmoWeb is Ai2’s open-source browser agent — 8B parameters, Apache 2.0, no API key required. It scores 78.2% on WebVoyager and beats GPT-4o-based agents on multiple benchmarks. Here’s how to get it running locally.
System requirements:
- GPU with at least 16GB VRAM (for 8B model) or 8GB VRAM (for 4B model)
- Ubuntu 20.04+ or macOS 12+ (Linux recommended for GPU support)
- Python 3.10+
- Chrome or Chromium browser installed
Step 1: Clone the Repository
git clone https://github.com/allenai/molmoweb.git
cd molmoweb
Step 2: Create a Virtual Environment and Install Dependencies
python3 -m venv molmoweb-env
source molmoweb-env/bin/activate
pip install -r requirements.txt
The requirements include PyTorch, the Transformers library, Playwright for browser control, and Pillow for screenshot processing. The full install typically takes 3–5 minutes on a good connection.
Step 3: Install Playwright Browser Drivers
MolmoWeb uses Playwright to control the browser. After installing Python dependencies:
playwright install chromium
playwright install-deps chromium
This downloads the Playwright-managed Chromium binary and installs system dependencies. On Ubuntu, install-deps may require sudo.
Step 4: Download the Model Weights
Ai2 hosts MolmoWeb weights on Hugging Face. Download via the Hugging Face CLI:
pip install huggingface-hub
# For the 8B model (recommended — better benchmark performance):
huggingface-cli download allenai/MolmoWeb-8B --local-dir ./models/molmoweb-8b
# For the 4B model (lower VRAM requirement):
huggingface-cli download allenai/MolmoWeb-4B --local-dir ./models/molmoweb-4b
The 8B model is approximately 16GB. Download time depends on your connection.
Step 5: Run MolmoWeb on a Task
With the model downloaded and the browser driver installed, you’re ready to run a task:
python run_agent.py \
--model ./models/molmoweb-8b \
--task "Go to weather.com and find the current temperature in San Francisco" \
--headless false
Setting --headless false lets you watch the agent control the browser in real time. For automated runs, set --headless true.
MolmoWeb will launch a Chromium browser, take an initial screenshot, and begin executing actions to complete the task. You’ll see the action log in your terminal as it runs.
Common Configuration Options
Timeout: --timeout 120 — maximum seconds per task (default: 60)
Screenshot interval: --screenshot-interval 2 — how often the agent takes a new screenshot to assess state (default: 2 seconds)
Save traces: --save-traces ./traces/ — saves screenshots and action logs for each run, useful for debugging
Custom start URL: --start-url https://example.com — start the browser at a specific URL instead of the default new tab
Troubleshooting
“CUDA out of memory” on the 8B model: Try the 4B model, or reduce the screenshot resolution with --screenshot-width 1024 (default is 1280).
Playwright browser fails to launch: Run playwright install-deps chromium again, and ensure you’re in the virtual environment where Playwright is installed.
Agent loops on the same page: Increase the timeout with --timeout 180 and check whether the target site has anti-bot measures that are blocking screenshot capture.
Model loads but actions are slow: MolmoWeb is GPU-intensive. If inference is slow, verify PyTorch is using your GPU: python -c "import torch; print(torch.cuda.is_available())". If this returns False, reinstall PyTorch with CUDA support for your CUDA version.
Running the Benchmarks
To reproduce the published benchmark results:
# WebVoyager benchmark
python benchmark.py --benchmark webvoyager --model ./models/molmoweb-8b --output ./results/
# DeepShop benchmark
python benchmark.py --benchmark deepshop --model ./models/molmoweb-8b --output ./results/
Benchmark runs require API credentials for some WebVoyager tasks (the benchmark uses live websites). See benchmarks/README.md in the repository for credential setup.
Related: MolmoWeb release overview · allenai/molmoweb on GitHub