Skip to main content
browseruse-bench includes an interactive visualization server for exploring experiment results at the task level — complementing the static leaderboard with trajectory playback, API log inspection, and evaluation detail views.

Features

Trajectory Playback

Browse step-by-step screenshots for each task

Evaluation Details

View eval prompts, scores, verdicts, and rubric criteria

API Log Inspection

Inspect per-step API calls and system prompts

Judge Experiment Sets

Compare evaluation methods across tasks with variance analysis

Quick Start

Start the server

# Generate index and start server (auto-regenerates on file changes)
bubench viz --watch

# Access at http://localhost:8080

Options

FlagDefaultDescription
--host127.0.0.1Bind address (use 0.0.0.0 to expose to the network)
--port8080Server port
--watchoffAuto-regenerate index when experiment files change
--watch-interval3.0Watch poll interval in seconds
--generate-onlyoffRegenerate experiments.json and exit without starting the server
Security note: The server binds to 127.0.0.1 by default so only the local machine can reach it. The /api/regenerate endpoint is unauthenticated and /experiments/* serves raw files (logs, screenshots, configs). Only pass --host 0.0.0.0 on trusted networks — see the Remote / Intranet Sharing section below.

Generate index only

bubench viz --generate-only
Scans experiments/ and writes browseruse_bench/visualization/data/experiments.json. Useful for CI or pre-generating before serving.

Experiment Directory Layout

The visualization server reads the same experiment directory structure as the leaderboard:
experiments/{benchmark}/{split}/{agent}/{timestamp}/
  tasks/{task_id}/
    result.json              # required
    trajectory/*.png         # step screenshots (optional)
    api_logs/step_*.json     # per-step API logs (optional)
    agent_history.gif        # animated replay (optional)
  tasks_eval_result/         # evaluation results (optional)
    *_eval_results.json
    *summary.json
A 5-level layout with an explicit model directory is also supported:
experiments/{benchmark}/{split}/{agent}/{model_id}/{timestamp}/

Remote / Intranet Sharing

Run the server in a tmux session so it stays alive after you disconnect from SSH: Install tmux (if not already installed):
brew install tmux
Start the server in the background:
tmux new-session -d -s viz "bubench viz --host 0.0.0.0 --port 8090 --watch"
Common tmux commands:
tmux attach -t viz          # view logs (Ctrl+b d to detach)
tmux kill-session -t viz    # stop the server
Find the server URL: when bound to 0.0.0.0, the startup log prints the detected LAN URL on its first lines — attach with tmux attach -t viz to read it. To look up the IP manually:
ipconfig getifaddr en0
Then open http://<server-ip>:8090/ in your browser. Firewall (if other machines cannot connect):
sudo ufw allow 8090/tcp

Leaderboard vs. Visualization

LeaderboardVisualization
PurposeAgent ranking overviewTask-level detail exploration
OutputSelf-contained HTML fileDynamic SPA served locally
GranularityRun-level aggregatesPer-task trajectories and logs
SharingShare the HTML file directlyRun the server on a shared host
Use the leaderboard for quick public sharing; use visualization for in-depth analysis during development.