Skip to Content
GatewayCost Tracking

Cost Tracking

The gateway records the cost of every LLM request in real time. Costs are accumulated in Redis for fast budget enforcement and persisted to MongoDB for audit and analytics. The dashboard provides a cost governance panel for monitoring spend across organizations, models, and time periods.

How cost recording works

After every proxy response completes — both streaming and non-streaming — the gateway runs the following sequence:

  1. Calculate actual cost from the provider’s reported token counts (prompt tokens, completion tokens, reasoning tokens, cache tokens)
  2. Increment daily cost in Redis (gateway:daily_cost:{org_id}:{date}) for budget enforcement
  3. Increment monthly cost in Redis (gateway:monthly_cost:{org_id}:{YYYY-MM}) for monthly budget enforcement
  4. Persist the full usage record to MongoDB (gateway_usage collection) for audit
  5. Track usage against the API key for per-key cost attribution
  6. Emit Prometheus metrics for system-level monitoring

Per-call logging

Every gateway request produces a usage record with the following fields:

FieldDescription
request_idUnique identifier for tracing
org_idOrganization that owns the API key
key_idThe specific API key used
providerLLM provider (openai, anthropic, google, deepseek)
modelModel name used for the request
prompt_tokensNumber of input tokens (from provider response)
completion_tokensNumber of output tokens (from provider response)
total_tokensTotal token count
reasoning_tokensReasoning/thinking tokens for o-series and R1 models
thinking_tokensExtended thinking tokens for Anthropic models
cache_creation_tokensAnthropic prompt cache creation tokens
cache_read_tokensAnthropic prompt cache read tokens
estimated_costPre-request cost estimate (USD)
actual_costComputed cost from actual token usage (USD)
latency_msEnd-to-end request latency in milliseconds
streamWhether the request used streaming SSE
status_codeHTTP status code from the provider
runner_idManaged runner ID (if the request originated from a runner session)
session_idRunner session ID (if applicable)
created_atTimestamp of the request

Token counting

Pre-request estimation

Before a request is proxied, the gateway estimates input token count using tiktoken BPE encoding. The correct encoding is selected per model family:

Model FamilyEncoding
GPT-4o, GPT-5, o-serieso200k_base
Claude, DeepSeek, Geminicl100k_base (approximation)

The estimator handles both OpenAI and Anthropic message formats, including multi-part content blocks, system prompts, and tool definitions.

If tiktoken fails for any reason, the estimator falls back to a character heuristic (1 token per 4 characters).

Post-request actual counts

After the response completes, actual token counts are extracted from the provider’s response:

ProviderPrompt TokensCompletion TokensSpecial Tokens
OpenAIusage.prompt_tokensusage.completion_tokenscompletion_tokens_details.reasoning_tokens
Anthropicusage.input_tokensusage.output_tokenscache_creation_input_tokens, cache_read_input_tokens
GoogleusageMetadata.promptTokenCountusageMetadata.candidatesTokenCount
DeepSeekusage.prompt_tokensusage.completion_tokenscompletion_tokens_details.reasoning_tokens

For streaming responses, usage data is extracted from the final SSE chunk. OpenAI requires stream_options: {"include_usage": true} to report usage in streaming mode — the gateway adds this automatically.

Per-org aggregation

Cost data is aggregated at the organization level using Redis counters:

Daily cost tracking

  • Key: gateway:daily_cost:{org_id}:{YYYY-MM-DD}
  • TTL: 48 hours
  • Operation: INCRBYFLOAT for precise float accumulation
  • Used by: Governance chain (Step 2) for daily budget enforcement

Monthly cost tracking

  • Key: gateway:monthly_cost:{org_id}:{YYYY-MM}
  • TTL: 35 days
  • Operation: INCRBYFLOAT for precise float accumulation
  • Used by: Governance chain (Step 2) for monthly budget enforcement

Daily request count

  • Key: gateway:daily_requests:{org_id}:{YYYY-MM-DD}
  • TTL: 48 hours
  • Operation: INCR
  • Used by: Analytics and dashboard request volume charts

Budget alerts and caps

The gateway enforces budget limits at three levels:

Per-request cap

Each request’s estimated cost is checked against max_cost_per_request. This prevents a single expensive request (e.g., a large context window with GPT-5) from consuming a disproportionate share of the budget.

TierMax Cost per Request
Free$0.25
Starter$0.50
Pro$2.00
Team$5.00
Enterprise$10.00

Daily budget cap

The sum of daily_cost + estimated_cost is checked against daily_budget. When the cap is reached, all requests for the org are blocked until the next UTC day.

TierDaily Budget
Free$5
Starter$25
Pro$100
Team$500
Enterprise$2,000

Monthly budget cap

The sum of monthly_cost + estimated_cost is checked against monthly_budget. When the cap is reached, all requests for the org are blocked until the next calendar month.

TierMonthly Budget
Free$50
Starter$250
Pro$2,000
Team$10,000
Enterprise$50,000

Custom limits can be configured per organization via the dashboard or the B2B API.

Cost comparison across providers

The gateway records the provider and model for every request, making it straightforward to compare costs across providers for equivalent tasks. The dashboard’s cost governance panel includes:

  • Cost by provider: Aggregate spend broken down by OpenAI, Anthropic, Google, Groq, Mistral, xAI, and more
  • Cost by model: Per-model spend ranking to identify the most expensive models in use
  • Cost trends: Daily and weekly cost charts with trend indicators
  • Per-key attribution: Cost breakdown by API key, useful for tracking spend per application or team

Runner session costs

For requests originating from managed runner sessions, the gateway additionally tracks costs at the session level via the CostSLOEngine. This enables:

  • Per-session cost limits (separate from org-level budgets)
  • Session cost accumulation in Redis for real-time enforcement
  • Cost attribution to specific runner executions in the audit log

Dashboard integration

The cost data recorded by the gateway feeds directly into the dashboard’s cost governance panel:

  • Real-time spend: Current daily and monthly totals pulled from Redis
  • Historical data: Full usage history queried from the MongoDB gateway_usage collection
  • Budget utilization: Visual indicators showing percentage of daily and monthly budgets consumed
  • Cost alerts: Configurable notifications when spend approaches budget thresholds

Access the cost governance panel in the dashboard under Gateway > Cost Tracking.

Querying cost data

Via the CLI

# Daily cost snapshot ./scripts/analytics costs today # Cost history for the last 7 days ./scripts/analytics costs history 7 # Full analytics snapshot ./scripts/analytics snapshot today

Via the B2B API

# Get cost summary for the current org curl https://api.curate-me.ai/api/v1/admin/gateway/costs/summary \ -H "Authorization: Bearer $TOKEN" \ -H "X-Org-ID: $ORG_ID" # Get usage records with filters curl "https://api.curate-me.ai/api/v1/admin/gateway/usage?provider=openai&limit=50" \ -H "Authorization: Bearer $TOKEN" \ -H "X-Org-ID: $ORG_ID"

Backend implementation

Key source files:

FilePurpose
src/gateway/cost_recorder.pyPost-response cost recording (Redis + MongoDB + Prometheus)
src/gateway/model_pricing.pyModel pricing tables and cost calculation functions
src/gateway/governance.pyPre-request cost estimation and budget enforcement
src/gateway/proxy.pyToken usage extraction from provider responses
src/gateway/metrics.pyPrometheus metric emission for cost and latency