Prerequisites
- Python 3.11+
- Node.js 18+ (only for Agent-TARS)
- uv (recommended Python package manager)
Installation
Install Python dependencies
bubench is on PATH:bubench run will create the agent venv defined in config.yaml (built-in defaults:
.venvs/browser_use, .venvs/skyvern, .venvs/agent_tars) and install the matching
dependencies on first use. Agent venv must be configured explicitly (no fallback to .venv).
If uv is not available, creation/install falls back to python -m venv and pip.Configure environment (.env)
.env and set evaluation and optional cloud settings:
Tip: If you are in China, set HF_ENDPOINT=https://hf-mirror.com to speed up HuggingFace downloads.
Configure agent credentials
configs/agents/browser-use/config.yamlneedsMODEL_TYPE,MODEL_ID, and the matching API key (BROWSER_USE_API_KEY,OPENAI_API_KEY, orGEMINI_API_KEY).- If
BROWSER_ID=agentbay, setAGENTBAY_API_KEYin.env(do not put it inconfig.yaml). configs/agents/Agent-TARS/config.yamlneedsMODEL_PROVIDER,MODEL_ID, andMODEL_APIKEY(plusMODEL_BASEURLif required).- Agent config files are read as plain YAML; environment variables are not auto-substituted.
Quick Run
Run your first benchmark
Smoke test (recommended)
Add--dry-run to verify configuration without executing tasks:
Evaluate results
Logs: Script execution logs are saved inoutput/logs/.
run.py:output/logs/run/eval.py:output/logs/eval/leaderboard:output/logs/leaderboard/
Generate leaderboard
Run Modes
| Mode | Description | Example |
|---|---|---|
single | Run the first task (sanity check) | --mode single |
first_n | Run the first N tasks | --mode first_n --count 5 |
sample_n | Randomly sample N tasks | --mode sample_n --count 10 |
specific | Run specified task IDs | --mode specific --task-ids id1 id2 |
by_id | Run one task by numeric ID field | --mode by_id --id 123 |
all | Run all tasks | --mode all |
Note: --task-ids expects a space-separated list.
Common Parameters
--data-source:localorhuggingface.--force-download: Force re-download in HuggingFace mode.--agent-config: Custom agent config path (defaults toconfigs/agents/<agent>/config.yaml).--timestamp: Resume or run in a specific directory (YYYYMMDD_HHmmss).
--timeout overrides TIMEOUT in the agent config.
Running Multiple Agents in Parallel
bubench run uses the venv specified by the agent entry in config.yaml and will auto-create/install dependencies
on first use. By default each built-in agent has a dedicated venv:
browser-use->.venvs/browser_useskyvern->.venvs/skyvernAgent-TARS->.venvs/agent_tars
venv, bubench run exits with an error instead of falling back to .venv.
If you need to run conflicting agents at the same time, open two terminals and run each agent with its own venv.
Node.js Agents (No Conflicts)
Agent-TARS runs via a Node.js CLI and does not share Python dependencies with other agents. You can run it in any terminal after installing the CLI.Next Steps
Supported Agents
Explore available browser agents
Benchmarks
Learn about each benchmark
Cloud Browser Setup
Configure Lexmount cloud browser
View Leaderboard
Compare agent performance