BaseAgent interface. The framework handles task loading, CLI parsing, workspaces, and result persistence — you only need to implement run_task() and register your agent.
How existing agents are integrated
1. browser-use — Python SDK (in-process)
Interface: directlyimport browser_use Python package, runs async in-process.
task → BrowserUseAgent.run_task() → create LLM instance (OpenAI/Gemini/Anthropic, etc.) → open_browser_session() open browser session → browser_use.Agent(task, llm, browser).run() → parse history, return AgentResult
Advantage: deepest integration — provides token usage, per-step screenshots, and full action history.
2. Skyvern — Python SDK + embedded service
Interface:import skyvern, requires local PostgreSQL for auth, supports both local and cloud modes.
task → SkyvernAgent.prepare() (init DB auth) → SkyvernAgent.run_task() → Skyvern.local() or Skyvern(api_key=...) → skyvern.run_task(prompt, engine, ...) → poll run_id until complete → collect screenshots from artifacts dir, return AgentResult
Note: heaviest dependency (requires PostgreSQL), but allows substituting your own LLM for Skyvern’s cloud model.
3. Agent-TARS — subprocess CLI
Interface: invokeagent-tars run --input "..." ... CLI tool, fully black-box.
task → AgentTARSAgent.run_task() → assemble CLI args → subprocess.Popen(["agent-tars", "run", ...]) → wait for process exit (with timeout) → parse event-stream.jsonl to extract actions/metrics → return AgentResult
Advantage: lightest weight — no need to understand agent internals, only requires an executable CLI.
What a new agent must expose / how to integrate
Required interface (choose one)
| Method | Requirement |
|---|---|
| Python SDK | pip install-able package with a run(task)-style API that returns a result |
| HTTP API | REST interface to submit tasks and poll for results |
| CLI | Executable command that accepts a task argument and outputs a result file on exit |
Required output (AgentResult)
Files to create
Two integration methods
1. Code Agent integration
You don’t need to implement it manually — just tell the Code Agent what you want to integrate, referencing an existing agent’s integration pattern.Example: integrating Claude Code as a browser agent
Here is an example prompt given to a Code Agent:I want to integrate claude-code as a browser agent in my evaluation framework, referencing Agent-TARS’s subprocess CLI approach, with the following constraints:The following is the Code Agent’s output (note: this is not the full output, just selected excerpts): Prerequisites:
- Follow Agent-TARS’s subprocess CLI integration pattern
- Give me prerequisites first; proceed with the implementation only after I confirm they’re installed
- Add config entries in the root
config.yamland register the agent environment inconfigs/agent_registry.yaml- The output directory must include
result.json(agent result),trajectory/(screenshots), andapi_logs/(intermediate steps)- Additional constraints:
- Output JSON events line by line;
type=resultcontains the final answer,type=usercontains tool results (including screenshot base64)- Restrict the agent to Playwright browser tools only, to prevent it from completing tasks via search APIs
- Skip permission prompts in non-interactive mode
- Provide a run example after integration is complete
browseruse_bench/agents/claude_code.py
Modified browseruse_bench/agents/__init__.py
Registered in config.yaml:
2. Manual integration
For self-developed agents, Python SDK agents, or HTTP API agents that need to implement browser control themselves.Integration steps
Step 1: Implement the agent
Step 2: Ensure the module is imported
Registration happens at import time. Add tobrowseruse_bench/agents/__init__.py:
Step 3: Register in config.yaml
Add model config underagents in the root config.yaml:
configs/agent_registry.yaml:
Step 4: Run a quick test
Reference: field descriptions
What is task_info?
A dictionary loaded from the benchmark dataset with normalized fields:
task_id(string)task_text(string)url(string)prompt(string, optional)
What is agent_config?
Loaded from agents.<agent>.models[active_model] in the root config.yaml (or via --agent-config). The framework injects timeout_seconds.
What is task_workspace?
The per-task output directory: <output_dir>/tasks/<task_id>/.
Store screenshots, logs, and intermediate artifacts here; the framework writes result.json to the same directory.
Browser backend constraints
If your agent needs browser access (manual integration mode), follow the unified backend contract:browseruse_bench/browsers/providers/. Cleanup failures should be logged and tolerated — they must not mask errors from the task execution phase.