browseruse-bench integrates multiple mainstream browser evaluation benchmarks, covering various types of web interaction tasks.Documentation Index
Fetch the complete documentation index at: https://docs.bubench.lexmount.io/llms.txt
Use this file to discover all available pages before exploring further.
Supported Benchmarks
LexBench-Browser
Recommended - Real-world browser-agent benchmark with 210 tasks across 107 distinct Chinese and English websites. No login required.
Online-Mind2Web
Online evaluation based on the Mind2Web dataset, testing agents’ navigation and interaction capabilities on real websites.
BrowseComp
Browser operation competition tasks, evaluating agents’ comprehensive browser operation capabilities.
Feature Comparison
| Benchmark | Tasks | Language | Evaluation | Login Required |
|---|---|---|---|---|
| LexBench-Browser | 210 | zh/en | LLM (visual) | No |
| Online-Mind2Web | 300 | English | WebJudge | No |
| BrowseComp | 1266 | English | Grader | No |
Quick Comparison Run
Data Location
All benchmark data is stored in thebenchmarks/ directory:
| Benchmark | Data File Path |
|---|---|
| LexBench-Browser | benchmarks/LexBench-Browser/data/ |
| Online-Mind2Web | benchmarks/Online-Mind2Web/data/ |
| BrowseComp | benchmarks/BrowseComp/data/ |
Planned Support
- More benchmarks
If you’d like to add a new benchmark, please refer to the Custom Benchmark guide.