-p) mode.
How It Works
Unlike SDK-based agents, Claude Code runs as an external subprocess:browser_take_screenshot tool results, and writes structured logs to api_logs/.
Prerequisites
1. Install Claude CodeConfiguration
Configure Claude Code in the rootconfig.yaml under agents.claude-code:
active_model to the profile name you want to use by default, then switch at runtime:
Config Parameters
| Parameter | Description | Default |
|---|---|---|
model_id | Claude model ID | claude-sonnet-4-6 |
max_turns | Max conversation turns (--max-turns) | 50 |
timeout | Task timeout in seconds | 300 |
allowed_tools | Tool name pattern passed to --allowedTools | mcp__playwright* |
system_prompt | Custom system prompt (overrides default) | See below |
playwright_mcp_command | Executable used to launch the Playwright MCP server | npx |
playwright_mcp_args | Arguments passed to the MCP launcher (e.g. package name + flags) | ["@playwright/mcp@latest"] |
Default System Prompt
Ifsystem_prompt is not set, the agent uses a built-in prompt that covers three areas:
- Tool restriction — Claude must use only
mcp__playwright__*tools; Bash, WebFetch, Skill, Agent, and other built-ins are explicitly prohibited. - Task completion rules — answer from data already visible on the page (e.g. ratings in search results) without navigating into individual items; handle CAPTCHAs and access restrictions by falling back to collected data with at most one retry.
- Screenshot rules — call
browser_take_screenshotwith{"type": "png"}only (nofilenameparameter, so the image is returned as inline base64 and captured by the result parser).
system_prompt in your config.yaml under agents.claude-code.defaults.
Usage
Basic Run
Run All Tasks
Evaluation
Output Structure
Each completed task writes:Screenshots are extracted from the
browser_take_screenshot tool results in the streamed output. The tool must be called without a filename argument so the image is returned as inline base64; the default system prompt enforces this. The Playwright MCP action timeout is set to 30 seconds (up from the 5 s default) to handle pages that load external fonts slowly.Supported Benchmarks
- ✅ LexBench-Browser
- ✅ Online-Mind2Web
- ✅ BrowseComp
Troubleshooting
“Executable ‘claude’ not found” Claude Code is not installed or not on$PATH. Run npm install -g @anthropic-ai/claude-code and verify with claude --version.
Permission denied on MCP tools
The agent runs with --dangerously-skip-permissions (required for non-interactive mode) and --allowedTools set to the configured pattern. If tools are still denied, confirm the MCP server name with claude mcp list and check that allowed_tools in config matches the prefix (e.g. mcp__playwright*).
No screenshots in results
Ensure the Playwright MCP server is connected (claude mcp list). The default system prompt instructs Claude Code to call browser_take_screenshot without a filename argument — if using a custom system_prompt, do the same so the image is returned inline and captured by the result parser. If screenshots still fail, check stdout.txt for TimeoutError: browserBackend.callTool — this indicates the page is loading external resources slowly; consider increasing playwright_mcp_args with a larger --timeout-action value.
stream-json requires —verbose
This flag is already included in the agent. If you see this error running claude manually, add --verbose.