Custom Agent

browseruse-bench integrates agents through a small BaseAgent interface. You implement one class and register it; the runner handles task loading, CLI parsing, workspaces, and result persistence.

What you implement vs. what the framework handles

You implement

browseruse_bench/agents/my_agent.py with a BaseAgent subclass
Optional runtime config under configs/agents/<agent>/config.yaml
Any extra dependencies your agent needs

The framework handles

CLI argument parsing and task selection
Loading tasks from the benchmark dataset
Creating task_workspace directories
Saving result.json and failure info
Logging and error handling via scripts/agent_runner.py

1. Suggested structure

browseruse_bench/
└── agents/
    └── my_agent.py
config/
└── agents/
    └── my-agent/
        └── config.yaml

2. Integration steps

Step 1: Implement the agent

# browseruse_bench/agents/my_agent.py
from __future__ import annotations

import logging
from pathlib import Path
from typing import Any, Dict

from browseruse_bench.agents.base import BaseAgent
from browseruse_bench.agents.registry import register_agent

logger = logging.getLogger(__name__)


@register_agent
class MyAgent(BaseAgent):
    name = "my-agent"

    def run_task(
        self,
        task_info: Dict[str, Any],
        agent_config: Dict[str, Any],
        task_workspace: Path,
    ) -> Dict[str, Any]:
        task_id = task_info.get("task_id", "unknown")
        task_text = task_info.get("task_text", "")
        timeout_seconds = agent_config.get("timeout_seconds")

        logger.info("Running task %s: %s (timeout=%s)", task_id, task_text, timeout_seconds)

        # ... agent logic ...

        return {
            "task_id": task_id,
            "status": "success",  # or "failed"
            "answer": "Final answer after task completion",
            "metrics": {
                "steps": 0,
                "duration_ms": 0,
            },
        }

Step 2: Ensure the module is imported

Registration happens when the module is imported. Add your module to browseruse_bench/agents/__init__.py:

from browseruse_bench.agents import my_agent  # noqa: F401

Step 3: Register the agent in config.yaml

Add an entry to the root config.yaml. The runner uses this for supported benchmarks, venv selection, and config paths. path and entrypoint are kept as metadata.

agents:
  my-agent:
    path: browseruse_bench/agents
    entrypoint: scripts/agent_runner.py
    config: configs/agents/my-agent/config.yaml
    supported_benchmarks:
      - Online-Mind2Web
    venv: .venv

Step 4: Run a quick test

bubench run \
  --agent my-agent \
  --benchmark Online-Mind2Web \
  --mode first_n \
  --count 1

3. Browser Backend Constraints (for browser agents)

If your custom agent needs a browser, follow the unified browser backend contract:

Read BROWSER_ID from agent_config and use open_browser_session(...) to acquire backend session context.
Keep provider lifecycle code (create/delete session, provider auth, provider SDK calls) in browseruse_bench/browsers/providers/, not in the agent module.
Use BrowserSessionContext (backend_id, transport, cdp_url, metadata) inside the agent to build runtime browser instances.
Keep optional SDK import handling inside provider modules (ImportError-safe import at module load, fail fast only when selected backend is actually used).
Cleanup errors in backend close(...) should be logged and tolerated; they should not mask task execution failures.

Minimal pattern:

from browseruse_bench.browsers import open_browser_session

browser_id = agent_config.get("BROWSER_ID") or "Chrome-Local"
with open_browser_session(browser_id=browser_id, agent_name=self.name, agent_config=agent_config) as session_context:
    ...

4. Adding a New Browser Backend (Checklist)

When adding a new browser type (for example, a new cloud provider):

Add a backend provider implementation under browseruse_bench/browsers/providers/<provider>.py implementing open(...) and close(...).
Register it in browseruse_bench/browsers/registry.py with a new browser_id.
If new config keys are introduced, add them to:
- configs/agents/<agent>/config.yaml.example
- related agent docs under docs/en/agents/*.mdx and docs/zh/agents/*.mdx
Add/extend tests in tests/browseruse_bench/test_browsers.py:
- open success path
- missing credentials/dependency path
- cleanup failure tolerance path
If the backend needs new optional dependencies, update:
- pyproject.toml optional dependency group(s)
- agent extra resolution in browseruse_bench/cli/run.py (if required)
Update docs:
- this page (/en/examples/custom-agent, /zh/examples/custom-agent)
- quickstart/env guidance when new env vars are required

What is `task_info`?

task_info is a dictionary loaded from the benchmark dataset (JSON/JSONL). It always contains normalized fields:

task_id (string)
task_text (string)
url (string)
prompt (string, optional when a prompt template is used)

Dataset-specific fields are preserved and passed through unchanged.

What is `agent_config`?

agent_config is loaded from configs/agents/<agent>/config.yaml (or --agent-config). The runner injects timeout_seconds after resolving CLI/config defaults.

What is `task_workspace`?

task_workspace is the per-task output directory: <output_dir>/tasks/<task_id>/. You can store screenshots, logs, or intermediate files there. The runner writes result.json into the same directory.

What should the result dict contain?

Minimal

task_id
status: "success" or "failed"

Recommended

answer: final answer (string)
metrics: e.g., steps, duration_ms
error: error message when status is "failed"

Dependencies

If additional packages are needed, add them to pyproject.toml. If you create a new extra group for your agent and want bubench run to auto-install it, add a mapping for your agent in browseruse_bench/cli/run.py (where extra_name is selected), or preinstall the extra in the agent venv.

Get Started

Features

Examples

Development

What you implement vs. what the framework handles

1. Suggested structure

2. Integration steps

Step 1: Implement the agent

Step 2: Ensure the module is imported

Step 3: Register the agent in config.yaml

Step 4: Run a quick test

3. Browser Backend Constraints (for browser agents)

4. Adding a New Browser Backend (Checklist)

What is `task_info`?

What is `agent_config`?

What is `task_workspace`?

What should the result dict contain?

Dependencies

Get Started

Features

Examples

Development

​What you implement vs. what the framework handles

​1. Suggested structure

​2. Integration steps

​Step 1: Implement the agent

​Step 2: Ensure the module is imported

​Step 3: Register the agent in config.yaml

​Step 4: Run a quick test

​3. Browser Backend Constraints (for browser agents)

​4. Adding a New Browser Backend (Checklist)

​What is task_info?

​What is agent_config?

​What is task_workspace?

​What should the result dict contain?

​Dependencies

What you implement vs. what the framework handles

1. Suggested structure

2. Integration steps

Step 1: Implement the agent

Step 2: Ensure the module is imported

Step 3: Register the agent in config.yaml

Step 4: Run a quick test

3. Browser Backend Constraints (for browser agents)

4. Adding a New Browser Backend (Checklist)

What is `task_info`?

What is `agent_config`?

What is `task_workspace`?

What should the result dict contain?

Dependencies