Cost Accounting - browseruse-bench

This page documents how metrics.usage cost fields are generated in browseruse-bench.

Where Cost Is Computed

Cost enrichment happens in the shared task runner before writing result.json:

Runner: browseruse_bench/runners/agent_runner.py
Function: enrich_result_usage_cost_if_needed(...)

If the incoming result already has complete cost fields and total_cost > 0, the runner keeps them. Otherwise it recomputes from usage + pricing.

How Token Usage Is Obtained

Token numbers come from the agent’s returned metrics.usage object (typically propagated from provider response usage), with this priority:

prompt_tokens: total_prompt_tokens -> prompt_tokens
completion_tokens: total_completion_tokens -> completion_tokens
total_tokens: total_tokens -> prompt_tokens + completion_tokens
cached_tokens:
- total_prompt_cached_tokens -> cached_tokens
- if still 0, fallback to prompt_tokens_details.cached_tokens

Special handling:

If prompt_tokens == 0 and completion_tokens == 0 but total_tokens > 0, the framework treats all tokens as prompt tokens.
If all token counters are 0, enrichment is skipped.

How Pricing Is Obtained

Pricing is loaded by load_litellm_price_table() and merged with optional custom pricing.

LiteLLM pricing source

Canonical URL:
- https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json
In-memory cache:
- _PRICE_CACHE keyed by URL, TTL 24h
Local file cache:
- ~/.cache/browseruse_bench/token_cost/pricing_bubench_*.json
- valid cache entries are reused before network fetch

Custom pricing source

Optional file:
- configs/pricing/model_pricing.yaml
Supported keys (per-token or per-million):
- input_cost_per_token, output_cost_per_token, cache_read_input_token_cost
- input_cost_per_million_tokens, output_cost_per_million_tokens, cache_read_input_cost_per_million_tokens
Matching:
- case-insensitive exact model key match
Precedence:
- custom pricing entry overrides LiteLLM pricing for the same model key

If pricing is still missing, cost rates default to 0.0 and total cost becomes 0.0 (with warning logs).

Cost Formula

All rates are normalized to USD per token.

non_cached_prompt_tokens = max(0, prompt_tokens - cached_tokens)
prompt_non_cached_cost   = non_cached_prompt_tokens * input_rate
prompt_cached_cost       = cached_tokens * cached_rate
prompt_cost              = prompt_non_cached_cost + prompt_cached_cost
completion_cost          = completion_tokens * output_rate
total_cost               = prompt_cost + completion_cost

Notes:

cached_rate defaults to input_rate when cache-specific rate is unavailable.
cached_tokens is clamped to [0, prompt_tokens].

Output Fields

After enrichment, metrics.usage includes:

total_prompt_tokens
total_prompt_cost
total_prompt_cached_tokens
total_prompt_cached_cost
total_completion_tokens
total_completion_cost
total_tokens
total_cost
entry_count
by_model (single-model summary with invocations and average_tokens_per_invocation)

Leaderboard aggregation reads usage.total_cost and usage.total_tokens from each task result.

Minimal Example

Input usage:

{
  "prompt_tokens": 1000,
  "completion_tokens": 100,
  "prompt_tokens_details": {
    "cached_tokens": 400
  }
}

Pricing:

{
  "input_cost_per_token": 0.000002,
  "output_cost_per_token": 0.000008,
  "cache_read_input_token_cost": 0.0000005
}

Computed:

non-cached prompt: 600 * 0.000002 = 0.0012
cached prompt: 400 * 0.0000005 = 0.0002
completion: 100 * 0.000008 = 0.0008
total: 0.0022

​Where Cost Is Computed

​How Token Usage Is Obtained

​How Pricing Is Obtained

​LiteLLM pricing source

​Custom pricing source

​Cost Formula

​Output Fields

​Minimal Example