metrics.usage cost fields are generated in browseruse-bench.
Where Cost Is Computed
Cost enrichment happens in the shared task runner before writingresult.json:
- Runner:
browseruse_bench/runners/agent_runner.py - Function:
enrich_result_usage_cost_if_needed(...)
total_cost > 0, the runner keeps them. Otherwise it recomputes from usage + pricing.
How Token Usage Is Obtained
Token numbers come from the agent’s returnedmetrics.usage object (typically propagated from provider response usage), with this priority:
prompt_tokens:total_prompt_tokens->prompt_tokenscompletion_tokens:total_completion_tokens->completion_tokenstotal_tokens:total_tokens->prompt_tokens + completion_tokenscached_tokens:total_prompt_cached_tokens->cached_tokens- if still
0, fallback toprompt_tokens_details.cached_tokens
- If
prompt_tokens == 0andcompletion_tokens == 0buttotal_tokens > 0, the framework treats all tokens as prompt tokens. - If all token counters are
0, enrichment is skipped.
How Pricing Is Obtained
Pricing is loaded byload_litellm_price_table() and merged with optional custom pricing.
LiteLLM pricing source
- Canonical URL:
https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json
- In-memory cache:
_PRICE_CACHEkeyed by URL, TTL 24h
- Local file cache:
~/.cache/browseruse_bench/token_cost/pricing_bubench_*.json- valid cache entries are reused before network fetch
Custom pricing source
- Optional file:
configs/pricing/model_pricing.yaml
- Supported keys (per-token or per-million):
input_cost_per_token,output_cost_per_token,cache_read_input_token_costinput_cost_per_million_tokens,output_cost_per_million_tokens,cache_read_input_cost_per_million_tokens
- Matching:
- case-insensitive exact model key match
- Precedence:
- custom pricing entry overrides LiteLLM pricing for the same model key
0.0 and total cost becomes 0.0 (with warning logs).
Cost Formula
All rates are normalized to USD per token.cached_ratedefaults toinput_ratewhen cache-specific rate is unavailable.cached_tokensis clamped to[0, prompt_tokens].
Output Fields
After enrichment,metrics.usage includes:
total_prompt_tokenstotal_prompt_costtotal_prompt_cached_tokenstotal_prompt_cached_costtotal_completion_tokenstotal_completion_costtotal_tokenstotal_costentry_countby_model(single-model summary withinvocationsandaverage_tokens_per_invocation)
usage.total_cost and usage.total_tokens from each task result.
Minimal Example
Input usage:- non-cached prompt:
600 * 0.000002 = 0.0012 - cached prompt:
400 * 0.0000005 = 0.0002 - completion:
100 * 0.000008 = 0.0008 - total:
0.0022