Quickstart

Getting Started

This guide will help you set up and run browseruse-bench.

Prerequisites

Python 3.8+
Node.js (for Agent-TARS)
PostgreSQL (optional, for database integration)

Installation

1. Clone the Repository

git clone https://github.com/lexmount/browseruse-bench.git
cd browseruse-bench

2. Install Dependencies

# Install core package
pip install -e .

# Install with browser-use agent support
pip install -e ".[browser-use]"

# Install all optional dependencies
pip install -e ".[all]"

3. Install Agent-TARS (Optional)

npm install -g @agent-tars/[email protected]

4. Configure Environment

cp .env.example .env
vim .env  # Edit with your API keys

Required environment variables:

OPENAI_API_KEY: OpenAI API key for evaluation
LEXMOUNT_API_KEY: Lexmount cloud browser API key
LEXMOUNT_PROJECT_ID: Lexmount project ID

5. Configure Agents

cp agents/Agent-TARS/config.yaml.example agents/Agent-TARS/config.yaml
cp agents/browser-use/config.yaml.example agents/browser-use/config.yaml
vim agents/Agent-TARS/config.yaml
vim agents/browser-use/config.yaml

Quick Start

Run a Benchmark

# Run first 3 tasks of Online-Mind2Web with Agent-TARS
uv run scripts/run.py --agent Agent-TARS --benchmark Online-Mind2Web --mode first_n --count 3

# Run LexBench-Browser (no login required subset)
uv run scripts/run.py --agent browser-use --benchmark LexBench-Browser --split no_login --mode first_n --count 5

Evaluate Results

# Evaluate LexBench-Browser results
uv run scripts/eval.py --agent browser-use --benchmark LexBench-Browser

# Evaluate with specific threshold
uv run scripts/eval.py --agent browser-use --benchmark LexBench-Browser --score-threshold 70

Logs: Script execution logs are saved in output/logs/.

run.py: output/logs/run/

eval.py: output/logs/eval/

generate_leaderboard.py: output/logs/leaderboard/

Generate Leaderboard

uv run scripts/generate_leaderboard.py

Get Started

Features

Examples

Development

Getting Started

Prerequisites

Installation

1. Clone the Repository

2. Install Dependencies

3. Install Agent-TARS (Optional)

4. Configure Environment

5. Configure Agents

Quick Start

Run a Benchmark

Evaluate Results

Generate Leaderboard

Next Steps

Get Started

Features

Examples

Development

​Getting Started

​Prerequisites

​Installation

​1. Clone the Repository

​2. Install Dependencies

​3. Install Agent-TARS (Optional)

​4. Configure Environment

​5. Configure Agents

​Quick Start

​Run a Benchmark

​Evaluate Results

​Generate Leaderboard

​Next Steps

Getting Started

Prerequisites

Installation

1. Clone the Repository

2. Install Dependencies

3. Install Agent-TARS (Optional)

4. Configure Environment

5. Configure Agents

Quick Start

Run a Benchmark

Evaluate Results

Generate Leaderboard

Next Steps