Cost Tracking

The gateway records the cost of every LLM request in real time. Costs are accumulated in Redis for fast budget enforcement and persisted to MongoDB for audit and analytics. The dashboard provides a cost governance panel for monitoring spend across organizations, models, and time periods.

How cost recording works

After every proxy response completes — both streaming and non-streaming — the gateway runs the following sequence:

Calculate actual cost from the provider’s reported token counts (prompt tokens, completion tokens, reasoning tokens, cache tokens)
Increment daily cost in Redis (gateway:daily_cost:{org_id}:{date}) for budget enforcement
Increment monthly cost in Redis (gateway:monthly_cost:{org_id}:{YYYY-MM}) for monthly budget enforcement
Persist the full usage record to MongoDB (gateway_usage collection) for audit
Track usage against the API key for per-key cost attribution
Emit Prometheus metrics for system-level monitoring

Per-call logging

Every gateway request produces a usage record with the following fields:

Field	Description
`request_id`	Unique identifier for tracing
`org_id`	Organization that owns the API key
`key_id`	The specific API key used
`provider`	LLM provider (openai, anthropic, google, deepseek)
`model`	Model name used for the request
`prompt_tokens`	Number of input tokens (from provider response)
`completion_tokens`	Number of output tokens (from provider response)
`total_tokens`	Total token count
`reasoning_tokens`	Reasoning/thinking tokens for o-series and R1 models
`thinking_tokens`	Extended thinking tokens for Anthropic models
`cache_creation_tokens`	Anthropic prompt cache creation tokens
`cache_read_tokens`	Anthropic prompt cache read tokens
`estimated_cost`	Pre-request cost estimate (USD)
`actual_cost`	Computed cost from actual token usage (USD)
`latency_ms`	End-to-end request latency in milliseconds
`stream`	Whether the request used streaming SSE
`status_code`	HTTP status code from the provider
`runner_id`	Managed runner ID (if the request originated from a runner session)
`session_id`	Runner session ID (if applicable)
`created_at`	Timestamp of the request

Token counting

Pre-request estimation

Before a request is proxied, the gateway estimates input token count using tiktoken BPE encoding. The correct encoding is selected per model family:

Model Family	Encoding
GPT-4o, GPT-5, o-series	`o200k_base`
Claude, DeepSeek, Gemini	`cl100k_base` (approximation)

The estimator handles both OpenAI and Anthropic message formats, including multi-part content blocks, system prompts, and tool definitions.

If tiktoken fails for any reason, the estimator falls back to a character heuristic (1 token per 4 characters).

Post-request actual counts

After the response completes, actual token counts are extracted from the provider’s response:

Provider	Prompt Tokens	Completion Tokens	Special Tokens
OpenAI	`usage.prompt_tokens`	`usage.completion_tokens`	`completion_tokens_details.reasoning_tokens`
Anthropic	`usage.input_tokens`	`usage.output_tokens`	`cache_creation_input_tokens`, `cache_read_input_tokens`
Google	`usageMetadata.promptTokenCount`	`usageMetadata.candidatesTokenCount`	—
DeepSeek	`usage.prompt_tokens`	`usage.completion_tokens`	`completion_tokens_details.reasoning_tokens`

For streaming responses, usage data is extracted from the final SSE chunk. OpenAI requires stream_options: {"include_usage": true} to report usage in streaming mode — the gateway adds this automatically.

Per-org aggregation

Cost data is aggregated at the organization level using Redis counters:

Daily cost tracking

Key: gateway:daily_cost:{org_id}:{YYYY-MM-DD}
TTL: 48 hours
Operation: INCRBYFLOAT for precise float accumulation
Used by: Governance chain (Step 2) for daily budget enforcement

Monthly cost tracking

Key: gateway:monthly_cost:{org_id}:{YYYY-MM}
TTL: 35 days
Operation: INCRBYFLOAT for precise float accumulation
Used by: Governance chain (Step 2) for monthly budget enforcement

Daily request count

Key: gateway:daily_requests:{org_id}:{YYYY-MM-DD}
TTL: 48 hours
Operation: INCR
Used by: Analytics and dashboard request volume charts

Budget alerts and caps

The gateway enforces budget limits at three levels:

Per-request cap

Each request’s estimated cost is checked against max_cost_per_request. This prevents a single expensive request (e.g., a large context window with GPT-5) from consuming a disproportionate share of the budget.

Tier	Max Cost per Request
Free	$0.25
Starter	$0.50
Growth	$2.00
Enterprise	$10.00

Daily budget cap

The sum of daily_cost + estimated_cost is checked against daily_budget. When the cap is reached, all requests for the org are blocked until the next UTC day.

Tier	Daily Budget
Free	$5
Starter	$25
Growth	$100
Enterprise	$2,000

Monthly budget cap

The sum of monthly_cost + estimated_cost is checked against monthly_budget. When the cap is reached, all requests for the org are blocked until the next calendar month.

Tier	Monthly Budget
Free	$50
Starter	$250
Growth	$2,000
Enterprise	$50,000

Custom limits can be configured per organization via the dashboard or the B2B API.

Cost comparison across providers

The gateway records the provider and model for every request, making it straightforward to compare costs across providers for equivalent tasks. The dashboard’s cost governance panel includes:

Cost by provider: Aggregate spend broken down by OpenAI, Anthropic, Google, Groq, Mistral, xAI, and more
Cost by model: Per-model spend ranking to identify the most expensive models in use
Cost trends: Daily and weekly cost charts with trend indicators
Per-key attribution: Cost breakdown by API key, useful for tracking spend per application or team

Runner session costs (private beta)

For requests originating from managed runner sessions (private beta), the gateway additionally tracks costs at the session level via the CostSLOEngine. This enables:

Per-session cost limits (separate from org-level budgets)
Session cost accumulation in Redis for real-time enforcement
Cost attribution to specific runner executions in the audit log

Dashboard integration

The cost data recorded by the gateway feeds directly into the dashboard’s cost governance panel:

Real-time spend: Current daily and monthly totals pulled from Redis
Historical data: Full usage history queried from the MongoDB gateway_usage collection
Budget utilization: Visual indicators showing percentage of daily and monthly budgets consumed
Cost alerts: Configurable notifications when spend approaches budget thresholds

Access the cost governance panel in the dashboard under Gateway > Cost Tracking.

Querying cost data

Via the CLI


# Daily cost snapshot
./scripts/analytics costs today
 
# Cost history for the last 7 days
./scripts/analytics costs history 7
 
# Full analytics snapshot
./scripts/analytics snapshot today

Via the B2B API


# Get cost summary for the current org
curl https://api.curate-me.ai/api/v1/admin/gateway/costs/summary \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Org-ID: $ORG_ID"
 
# Get usage records with filters
curl "https://api.curate-me.ai/api/v1/admin/gateway/usage?provider=openai&limit=50" \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Org-ID: $ORG_ID"

Backend implementation

Key source files:

File	Purpose
`src/gateway/cost_recorder.py`	Post-response cost recording (Redis + MongoDB + Prometheus)
`src/gateway/model_pricing.py`	Model pricing tables and cost calculation functions
`src/gateway/governance.py`	Pre-request cost estimation and budget enforcement
`src/gateway/proxy.py`	Token usage extraction from provider responses
`src/gateway/metrics.py`	Prometheus metric emission for cost and latency