Skip to main content

BrowseComp

BrowseComp is a benchmark for browser operation competition tasks.

Overview

AttributeValue
Task TypeBrowser operations
EvaluationGrader-based scoring

Quick Start

# Run tasks
bubench run --agent browser-use --benchmark BrowseComp --mode first_n --count 3

# Evaluate results
bubench eval --agent browser-use --benchmark BrowseComp

Data Loading

BrowseComp supports local JSONL files or HuggingFace downloads. To use HuggingFace:
bubench run --agent browser-use --benchmark BrowseComp \
  --data-source huggingface
The HuggingFace parquet file is converted to JSONL in the HF cache before use.