Skip to main content
browseruse-bench integrates multiple mainstream browser evaluation benchmarks, covering various types of web interaction tasks.

Supported Benchmarks

LexBench-Browser

Recommended - Evaluation benchmark for Chinese websites with 387 tasks (v2.0). L1 is a no-login subset for quick runs.

Online-Mind2Web

Online evaluation based on the Mind2Web dataset, testing agents’ navigation and interaction capabilities on real websites.

BrowseComp

Browser operation competition tasks, evaluating agents’ comprehensive browser operation capabilities.

Feature Comparison

BenchmarkTasksLanguageEvaluationLogin Required
LexBench-Browser387zh/enLLM (visual)Partial
Online-Mind2Web300EnglishWebJudgeNo
BrowseComp1266EnglishGraderNo

Quick Comparison Run

# LexBench-Browser (Recommended, L1 no-login subset)
bubench run --agent browser-use --benchmark LexBench-Browser --split L1 --mode first_n --count 5

# Online-Mind2Web
bubench run --agent browser-use --benchmark Online-Mind2Web --mode first_n --count 5

# BrowseComp
bubench run --agent browser-use --benchmark BrowseComp --mode first_n --count 5

Data Location

All benchmark data is stored in the benchmarks/ directory:
BenchmarkData File Path
LexBench-Browserbenchmarks/LexBench-Browser/data/
Online-Mind2Webbenchmarks/Online-Mind2Web/data/
BrowseCompbenchmarks/BrowseComp/data/
For more details on data loading configuration (including HuggingFace support and private datasets), please refer to the Data Loading guide.

Planned Support

  • More benchmarks
If you’d like to add a new benchmark, please refer to the Custom Benchmark guide.