BaseAgent interface. You implement one class and register it; the runner handles task loading, CLI parsing, workspaces, and result persistence.
What you implement vs. what the framework handles
You implementbrowseruse_bench/agents/my_agent.pywith aBaseAgentsubclass- Optional runtime config under
configs/agents/<agent>/config.yaml - Any extra dependencies your agent needs
- CLI argument parsing and task selection
- Loading tasks from the benchmark dataset
- Creating
task_workspacedirectories - Saving
result.jsonand failure info - Logging and error handling via
scripts/agent_runner.py
1. Suggested structure
2. Integration steps
Step 1: Implement the agent
Step 2: Ensure the module is imported
Registration happens when the module is imported. Add your module tobrowseruse_bench/agents/__init__.py:
Step 3: Register the agent in config.yaml
Add an entry to the rootconfig.yaml. The runner uses this for supported benchmarks, venv selection, and config paths. path and entrypoint are kept as metadata.
Step 4: Run a quick test
3. Browser Backend Constraints (for browser agents)
If your custom agent needs a browser, follow the unified browser backend contract:- Read
BROWSER_IDfromagent_configand useopen_browser_session(...)to acquire backend session context. - Keep provider lifecycle code (create/delete session, provider auth, provider SDK calls) in
browseruse_bench/browsers/providers/, not in the agent module. - Use
BrowserSessionContext(backend_id,transport,cdp_url,metadata) inside the agent to build runtime browser instances. - Keep optional SDK import handling inside provider modules (
ImportError-safe import at module load, fail fast only when selected backend is actually used). - Cleanup errors in backend
close(...)should be logged and tolerated; they should not mask task execution failures.
4. Adding a New Browser Backend (Checklist)
When adding a new browser type (for example, a new cloud provider):- Add a backend provider implementation under
browseruse_bench/browsers/providers/<provider>.pyimplementingopen(...)andclose(...). - Register it in
browseruse_bench/browsers/registry.pywith a newbrowser_id. - If new config keys are introduced, add them to:
configs/agents/<agent>/config.yaml.example- related agent docs under
docs/en/agents/*.mdxanddocs/zh/agents/*.mdx
- Add/extend tests in
tests/browseruse_bench/test_browsers.py:- open success path
- missing credentials/dependency path
- cleanup failure tolerance path
- If the backend needs new optional dependencies, update:
pyproject.tomloptional dependency group(s)- agent extra resolution in
browseruse_bench/cli/run.py(if required)
- Update docs:
- this page (
/en/examples/custom-agent,/zh/examples/custom-agent) - quickstart/env guidance when new env vars are required
- this page (
What is task_info?
task_info is a dictionary loaded from the benchmark dataset (JSON/JSONL). It always contains normalized fields:
task_id(string)task_text(string)url(string)prompt(string, optional when a prompt template is used)
What is agent_config?
agent_config is loaded from configs/agents/<agent>/config.yaml (or --agent-config). The runner injects timeout_seconds after resolving CLI/config defaults.
What is task_workspace?
task_workspace is the per-task output directory:
<output_dir>/tasks/<task_id>/.
You can store screenshots, logs, or intermediate files there. The runner writes result.json into the same directory.
What should the result dict contain?
Minimaltask_idstatus:"success"or"failed"
answer: final answer (string)metrics: e.g., steps, duration_mserror: error message whenstatusis"failed"
Dependencies
If additional packages are needed, add them topyproject.toml. If you create a new extra group for your agent and want bubench run to auto-install it, add a mapping for your agent in browseruse_bench/cli/run.py (where extra_name is selected), or preinstall the extra in the agent venv.