自定义 Agent

browseruse-bench 通过 BaseAgent 接口集成 Agent。你只需要实现一个类并注册，运行器会负责任务加载、命令行解析、工作目录和结果保存。

你需要做的 vs 框架会帮你做的

你需要做的

编写 browseruse_bench/agents/my_agent.py（BaseAgent 子类）
在 configs/agents/<agent>/config.yaml 中配置运行参数（可选）
添加你的依赖

框架会帮你做的

解析命令行参数与任务选择
从 benchmark 数据集加载任务
创建 task_workspace 工作目录
保存 result.json 与失败信息
通过 scripts/agent_runner.py 统一日志与异常处理

1. 建议目录结构

browseruse_bench/
└── agents/
    └── my_agent.py
config/
└── agents/
    └── my-agent/
        └── config.yaml

2. 接入步骤

第一步：实现 Agent

# browseruse_bench/agents/my_agent.py
from __future__ import annotations

import logging
from pathlib import Path
from typing import Any, Dict

from browseruse_bench.agents.base import BaseAgent
from browseruse_bench.agents.registry import register_agent

logger = logging.getLogger(__name__)


@register_agent
class MyAgent(BaseAgent):
    name = "my-agent"

    def run_task(
        self,
        task_info: Dict[str, Any],
        agent_config: Dict[str, Any],
        task_workspace: Path,
    ) -> Dict[str, Any]:
        task_id = task_info.get("task_id", "unknown")
        task_text = task_info.get("task_text", "")
        timeout_seconds = agent_config.get("timeout_seconds")

        logger.info("Running task %s: %s (timeout=%s)", task_id, task_text, timeout_seconds)

        # ... agent logic ...

        return {
            "task_id": task_id,
            "status": "success",  # 或 "failed"
            "answer": "Final answer after task completion",
            "metrics": {
                "steps": 0,
                "duration_ms": 0,
            },
        }

第二步：确保模块被导入

注册发生在模块被导入时。请在 browseruse_bench/agents/__init__.py 添加你的模块：

from browseruse_bench.agents import my_agent  # noqa: F401

第三步：在 config.yaml 中注册 Agent

在根目录的 config.yaml 中添加你的 Agent 条目。运行器会读取支持的 benchmark、venv 与配置路径。path 与 entrypoint 作为元数据保留。

agents:
  my-agent:
    path: browseruse_bench/agents
    entrypoint: scripts/agent_runner.py
    config: configs/agents/my-agent/config.yaml
    supported_benchmarks:
      - Online-Mind2Web
    venv: .venv

第四步：快速运行测试

bubench run \
  --agent my-agent \
  --benchmark Online-Mind2Web \
  --mode first_n \
  --count 1

3. 浏览器后端约束（适用于浏览器类 Agent）

如果你的自定义 Agent 需要浏览器能力，请遵循统一后端契约：

从 agent_config 读取 BROWSER_ID，并通过 open_browser_session(...) 获取后端会话上下文。
Provider 的生命周期代码（创建/销毁会话、鉴权、SDK 调用）应放在 browseruse_bench/browsers/providers/，不要散落在 agent 模块里。
Agent 里只消费 BrowserSessionContext（backend_id、transport、cdp_url、metadata）来构建运行时浏览器实例。
可选依赖的导入容错放在 provider 模块中处理（模块加载时 ImportError 兼容；仅在真正使用该 backend 时才报错）。
backend close(...) 的清理失败应记录日志并容忍，不应覆盖任务执行阶段的原始错误。

最小模式示例：

from browseruse_bench.browsers import open_browser_session

browser_id = agent_config.get("BROWSER_ID") or "Chrome-Local"
with open_browser_session(browser_id=browser_id, agent_name=self.name, agent_config=agent_config) as session_context:
    ...

4. 新增 Browser 类型（Checklist）

当你要新增一个浏览器类型（例如新增云厂商）时：

在 browseruse_bench/browsers/providers/<provider>.py 新增后端实现，提供 open(...) 与 close(...)。
在 browseruse_bench/browsers/registry.py 注册新的 browser_id。
若新增配置项，同步更新：
- configs/agents/<agent>/config.yaml.example
- 对应 agent 文档 docs/en/agents/*.mdx 与 docs/zh/agents/*.mdx
在 tests/browseruse_bench/test_browsers.py 新增/扩展测试：
- open 成功路径
- 缺少凭据/依赖路径
- cleanup 失败容忍路径
若引入新可选依赖，更新：
- pyproject.toml 的 optional dependency 分组
- browseruse_bench/cli/run.py 中 agent extra 的解析映射（如需要）
更新文档：
- 本页（/en/examples/custom-agent、/zh/examples/custom-agent）
- 若引入新环境变量，更新 quickstart/env 说明

`task_info` 是什么？

task_info 是框架从 benchmark 数据集（JSON/JSONL）加载的任务字典。它一定包含以下标准字段：

task_id（字符串）
task_text（字符串）
url（字符串）
prompt（字符串，可选，当使用模板生成 prompt 时）

此外，数据集中包含的其他字段会被原样保留并传入。

`agent_config` 是什么？

agent_config 默认从 configs/agents/<agent>/config.yaml 读取（或通过 --agent-config 指定路径）。运行器会在解析超时后注入 timeout_seconds。

`task_workspace` 是什么？

task_workspace 是框架为每个任务创建的输出目录： <output_dir>/tasks/<task_id>/。你可以在这里保存截图、日志或中间产物；运行器会在同目录写入 result.json。

返回结果字典需要包含什么？

最少

task_id
status: "success" 或 "failed"

依赖

如果需要额外的 Python 包，请在 pyproject.toml 中添加。如果你为 Agent 新增了 extra 依赖分组，并希望 bubench run 自动安装，请在 browseruse_bench/cli/run.py 中为该 Agent 添加 extra 映射（extra_name 逻辑），或提前在该 Agent 的 venv 中手动安装对应 extra。

开始

功能

示例

开发

你需要做的 vs 框架会帮你做的

1. 建议目录结构

2. 接入步骤

第一步：实现 Agent

第二步：确保模块被导入

第三步：在 config.yaml 中注册 Agent

第四步：快速运行测试

3. 浏览器后端约束（适用于浏览器类 Agent）

4. 新增 Browser 类型（Checklist）

`task_info` 是什么？

`agent_config` 是什么？

`task_workspace` 是什么？

返回结果字典需要包含什么？

依赖

开始

功能

示例

开发

​你需要做的 vs 框架会帮你做的

​1. 建议目录结构

​2. 接入步骤

​第一步：实现 Agent

​第二步：确保模块被导入

​第三步：在 config.yaml 中注册 Agent

​第四步：快速运行测试

​3. 浏览器后端约束（适用于浏览器类 Agent）

​4. 新增 Browser 类型（Checklist）

​task_info 是什么？

​agent_config 是什么？

​task_workspace 是什么？

​返回结果字典需要包含什么？

​依赖

你需要做的 vs 框架会帮你做的

1. 建议目录结构

2. 接入步骤

第一步：实现 Agent

第二步：确保模块被导入

第三步：在 config.yaml 中注册 Agent

第四步：快速运行测试

3. 浏览器后端约束（适用于浏览器类 Agent）

4. 新增 Browser 类型（Checklist）

`task_info` 是什么？

`agent_config` 是什么？

`task_workspace` 是什么？

返回结果字典需要包含什么？

依赖