Skip to main content
browseruse-bench integrates agents through a small BaseAgent interface. You implement one class and register it; the runner handles task loading, CLI parsing, workspaces, and result persistence.

What you implement vs. what the framework handles

You implement
  • browseruse_bench/agents/my_agent.py with a BaseAgent subclass
  • Optional runtime config under configs/agents/<agent>/config.yaml
  • Any extra dependencies your agent needs
The framework handles
  • CLI argument parsing and task selection
  • Loading tasks from the benchmark dataset
  • Creating task_workspace directories
  • Saving result.json and failure info
  • Logging and error handling via scripts/agent_runner.py

1. Suggested structure

browseruse_bench/
└── agents/
    └── my_agent.py
config/
└── agents/
    └── my-agent/
        └── config.yaml

2. Integration steps

Step 1: Implement the agent

# browseruse_bench/agents/my_agent.py
from __future__ import annotations

import logging
from pathlib import Path
from typing import Any, Dict

from browseruse_bench.agents.base import BaseAgent
from browseruse_bench.agents.registry import register_agent

logger = logging.getLogger(__name__)


@register_agent
class MyAgent(BaseAgent):
    name = "my-agent"

    def run_task(
        self,
        task_info: Dict[str, Any],
        agent_config: Dict[str, Any],
        task_workspace: Path,
    ) -> Dict[str, Any]:
        task_id = task_info.get("task_id", "unknown")
        task_text = task_info.get("task_text", "")
        timeout_seconds = agent_config.get("timeout_seconds")

        logger.info("Running task %s: %s (timeout=%s)", task_id, task_text, timeout_seconds)

        # ... agent logic ...

        return {
            "task_id": task_id,
            "status": "success",  # or "failed"
            "answer": "Final answer after task completion",
            "metrics": {
                "steps": 0,
                "duration_ms": 0,
            },
        }

Step 2: Ensure the module is imported

Registration happens when the module is imported. Add your module to browseruse_bench/agents/__init__.py:
from browseruse_bench.agents import my_agent  # noqa: F401

Step 3: Register the agent in config.yaml

Add an entry to the root config.yaml. The runner uses this for supported benchmarks, venv selection, and config paths. path and entrypoint are kept as metadata.
agents:
  my-agent:
    path: browseruse_bench/agents
    entrypoint: scripts/agent_runner.py
    config: configs/agents/my-agent/config.yaml
    supported_benchmarks:
      - Online-Mind2Web
    venv: .venv

Step 4: Run a quick test

bubench run \
  --agent my-agent \
  --benchmark Online-Mind2Web \
  --mode first_n \
  --count 1

3. Browser Backend Constraints (for browser agents)

If your custom agent needs a browser, follow the unified browser backend contract:
  • Read BROWSER_ID from agent_config and use open_browser_session(...) to acquire backend session context.
  • Keep provider lifecycle code (create/delete session, provider auth, provider SDK calls) in browseruse_bench/browsers/providers/, not in the agent module.
  • Use BrowserSessionContext (backend_id, transport, cdp_url, metadata) inside the agent to build runtime browser instances.
  • Keep optional SDK import handling inside provider modules (ImportError-safe import at module load, fail fast only when selected backend is actually used).
  • Cleanup errors in backend close(...) should be logged and tolerated; they should not mask task execution failures.
Minimal pattern:
from browseruse_bench.browsers import open_browser_session

browser_id = agent_config.get("BROWSER_ID") or "Chrome-Local"
with open_browser_session(browser_id=browser_id, agent_name=self.name, agent_config=agent_config) as session_context:
    ...

4. Adding a New Browser Backend (Checklist)

When adding a new browser type (for example, a new cloud provider):
  1. Add a backend provider implementation under browseruse_bench/browsers/providers/<provider>.py implementing open(...) and close(...).
  2. Register it in browseruse_bench/browsers/registry.py with a new browser_id.
  3. If new config keys are introduced, add them to:
    • configs/agents/<agent>/config.yaml.example
    • related agent docs under docs/en/agents/*.mdx and docs/zh/agents/*.mdx
  4. Add/extend tests in tests/browseruse_bench/test_browsers.py:
    • open success path
    • missing credentials/dependency path
    • cleanup failure tolerance path
  5. If the backend needs new optional dependencies, update:
    • pyproject.toml optional dependency group(s)
    • agent extra resolution in browseruse_bench/cli/run.py (if required)
  6. Update docs:
    • this page (/en/examples/custom-agent, /zh/examples/custom-agent)
    • quickstart/env guidance when new env vars are required

What is task_info?

task_info is a dictionary loaded from the benchmark dataset (JSON/JSONL). It always contains normalized fields:
  • task_id (string)
  • task_text (string)
  • url (string)
  • prompt (string, optional when a prompt template is used)
Dataset-specific fields are preserved and passed through unchanged.

What is agent_config?

agent_config is loaded from configs/agents/<agent>/config.yaml (or --agent-config). The runner injects timeout_seconds after resolving CLI/config defaults.

What is task_workspace?

task_workspace is the per-task output directory: <output_dir>/tasks/<task_id>/. You can store screenshots, logs, or intermediate files there. The runner writes result.json into the same directory.

What should the result dict contain?

Minimal
  • task_id
  • status: "success" or "failed"
Recommended
  • answer: final answer (string)
  • metrics: e.g., steps, duration_ms
  • error: error message when status is "failed"

Dependencies

If additional packages are needed, add them to pyproject.toml. If you create a new extra group for your agent and want bubench run to auto-install it, add a mapping for your agent in browseruse_bench/cli/run.py (where extra_name is selected), or preinstall the extra in the agent venv.