Skip to main content
This project supports flexible benchmark data loading from both local files and HuggingFace.

Local Default

Uses local data by default. HuggingFace mode downloads into the HF cache (~/.cache/huggingface).

JSONL Format

Uses efficient JSONL format for data storage, supporting streaming processing.

Data Source Configuration

1. Standard Benchmarks

For LexBench-Browser and Online-Mind2Web, configure HuggingFace details in benchmarks/{benchmark}/data/data_info.json:
{
  "split": {
    "All": "LexBench-Browser/tasks.jsonl",
    "L1": "LexBench-Browser/l1.jsonl"
  },
  "default_split": "All",
  "huggingface": {
    "repo_id": "Lexmount/LexBench-Browser-Public",
    "private": false,
    "path_prefix": "LexBench-Browser"
  }
}
Split paths are relative to benchmarks/{benchmark}/data/ and can include subdirectories (e.g., LexBench-Browser/, LexBench-Online_Mind2Web/, or date folders).
huggingface
object

2. BrowseComp (Local or HuggingFace)

BrowseComp supports local JSONL files or HuggingFace downloads. When using HuggingFace, the parquet file is downloaded into the HF cache and converted to JSONL for use.
{
  "browsecomp": {
    "csv_url": "https://openaipublic.blob.core.windows.net/simple-evals/browse_comp_test_set.csv",
    "hf_repo_id": "MultiturnRL/BrowseComp",
    "hf_path_prefix": "data",
    "hf_filename": "browsecomp-00000-of-00001.parquet"
  }
}
BrowseComp HuggingFace fields:
  • hf_repo_id: Dataset repo ID.
  • hf_path_prefix: Subdirectory inside the repo (e.g., data).
  • hf_filename: Parquet file name.
  • hf_revision (optional): Repo revision.
  • hf_private (optional): Set to true if the repo requires a token.

CLI Usage

bubench run and bubench eval support the --data-source argument to control data loading behavior:
bubench run ... --data-source [local|huggingface]
ModeDescription
local (Default)Uses local files. Errors if files are missing. Suitable for offline usage.
huggingfaceDownloads from HuggingFace and uses the HF cache (default ~/.cache/huggingface).
--force-downloadWith huggingface, forces a re-download into the HF cache.
Notes:
  • Local and HuggingFace storage are separate. HF downloads stay in the cache and are not copied into benchmarks/....
  • --force-download only applies to huggingface mode.
  • BrowseComp HuggingFace data is parquet and is converted to JSONL in the HF cache.

Run Examples

# Use local data (default)
bubench run --benchmark LexBench-Browser --agent Agent-TARS

Evaluation Examples (LexBench-Browser)

bubench eval passes --data-source only for LexBench-Browser. Other benchmarks use results files or local paths.
bubench eval --agent browser-use --benchmark LexBench-Browser --split L1 \
  --data-source huggingface

Environment Variables

When using private datasets, you must configure the HF_TOKEN environment variable.
# Temporary
export HF_TOKEN=hf_your_token_here

# Permanent
echo 'export HF_TOKEN=hf_your_token_here' >> ~/.bashrc
source ~/.bashrc
You can get your Access Token from HuggingFace Settings. HuggingFace caches files under ~/.cache/huggingface by default. You can override this with HF_HOME or HF_HUB_CACHE.

Data Format

JSONL Format

To improve efficiency with large files, we use JSONL (JSON Lines) format, where each line is an independent JSON object.
tasks.jsonl
{"task_id": "1", "query": "Search for iPhone on JD", "target_website": "www.jd.com"}
{"task_id": "2", "query": "View shopping cart", "target_website": "www.taobao.com"}

Directory Structure

benchmarks/
├── LexBench-Browser/
│   └── data/
│       ├── data_info.json          # Contains HF config
│       ├── LexBench-Browser/
│       │   ├── tasks.jsonl         # Split files
│       │   └── l1.jsonl
│       └── LexBench-Online_Mind2Web/
│           └── Online_Mind2Web.json
├── Online-Mind2Web/
│   └── data/
│       ├── data_info.json          # Contains HF config
│       └── 20251214/
│           └── Online_Mind2Web.json
├── BrowseComp/
│   └── data/
│       ├── data_info.json          # Contains metadata (csv_url reference)
│       └── 20250410/
│           └── tasks.jsonl         # Local data file

HF cache is stored outside the repo (default `~/.cache/huggingface`).

Troubleshooting

Error: Private HuggingFace dataset requires authenticationSolution: Ensure HF_TOKEN environment variable is set.
Option 1: If you are in mainland China, use an HF mirror:
export HF_ENDPOINT=https://hf-mirror.com
Option 2: Manually download files and place them in the corresponding benchmarks/{name}/data/{split_path} directory.