Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bubench.lexmount.io/llms.txt

Use this file to discover all available pages before exploring further.

browseruse-bench integrates multiple mainstream browser evaluation benchmarks, covering various types of web interaction tasks.

Supported Benchmarks

LexBench-Browser

Recommended - Real-world browser-agent benchmark with 210 tasks across 107 distinct Chinese and English websites. No login required.

Online-Mind2Web

Online evaluation based on the Mind2Web dataset, testing agents’ navigation and interaction capabilities on real websites.

BrowseComp

Browser operation competition tasks, evaluating agents’ comprehensive browser operation capabilities.

Feature Comparison

BenchmarkTasksLanguageEvaluationLogin Required
LexBench-Browser210zh/enLLM (visual)No
Online-Mind2Web300EnglishWebJudgeNo
BrowseComp1266EnglishGraderNo

Quick Comparison Run

# LexBench-Browser (recommended; no login required)
bubench run --agent browser-use --data LexBench-Browser --mode first_n --count 5

# Online-Mind2Web
bubench run --agent browser-use --data Online-Mind2Web --mode first_n --count 5

# BrowseComp
bubench run --agent browser-use --data BrowseComp --mode first_n --count 5

Data Location

All benchmark data is stored in the benchmarks/ directory:
BenchmarkData File Path
LexBench-Browserbenchmarks/LexBench-Browser/data/
Online-Mind2Webbenchmarks/Online-Mind2Web/data/
BrowseCompbenchmarks/BrowseComp/data/
For more details on data loading configuration (including HuggingFace support and private datasets), please refer to the Data Loading guide.

Planned Support

  • More benchmarks
If you’d like to add a new benchmark, please refer to the Custom Benchmark guide.