Cost Tracking
The gateway records the cost of every LLM request in real time. Costs are accumulated in Redis for fast budget enforcement and persisted to MongoDB for audit and analytics. The dashboard provides a cost governance panel for monitoring spend across organizations, models, and time periods.
How cost recording works
After every proxy response completes — both streaming and non-streaming — the gateway runs the following sequence:
- Calculate actual cost from the provider’s reported token counts (prompt tokens, completion tokens, reasoning tokens, cache tokens)
- Increment daily cost in Redis (
gateway:daily_cost:{org_id}:{date}) for budget enforcement - Increment monthly cost in Redis (
gateway:monthly_cost:{org_id}:{YYYY-MM}) for monthly budget enforcement - Persist the full usage record to MongoDB (
gateway_usagecollection) for audit - Track usage against the API key for per-key cost attribution
- Emit Prometheus metrics for system-level monitoring
Per-call logging
Every gateway request produces a usage record with the following fields:
| Field | Description |
|---|---|
request_id | Unique identifier for tracing |
org_id | Organization that owns the API key |
key_id | The specific API key used |
provider | LLM provider (openai, anthropic, google, deepseek) |
model | Model name used for the request |
prompt_tokens | Number of input tokens (from provider response) |
completion_tokens | Number of output tokens (from provider response) |
total_tokens | Total token count |
reasoning_tokens | Reasoning/thinking tokens for o-series and R1 models |
thinking_tokens | Extended thinking tokens for Anthropic models |
cache_creation_tokens | Anthropic prompt cache creation tokens |
cache_read_tokens | Anthropic prompt cache read tokens |
estimated_cost | Pre-request cost estimate (USD) |
actual_cost | Computed cost from actual token usage (USD) |
latency_ms | End-to-end request latency in milliseconds |
stream | Whether the request used streaming SSE |
status_code | HTTP status code from the provider |
runner_id | Managed runner ID (if the request originated from a runner session) |
session_id | Runner session ID (if applicable) |
created_at | Timestamp of the request |
Token counting
Pre-request estimation
Before a request is proxied, the gateway estimates input token count using tiktoken BPE encoding. The correct encoding is selected per model family:
| Model Family | Encoding |
|---|---|
| GPT-4o, GPT-5, o-series | o200k_base |
| Claude, DeepSeek, Gemini | cl100k_base (approximation) |
The estimator handles both OpenAI and Anthropic message formats, including multi-part content blocks, system prompts, and tool definitions.
If tiktoken fails for any reason, the estimator falls back to a character heuristic (1 token per 4 characters).
Post-request actual counts
After the response completes, actual token counts are extracted from the provider’s response:
| Provider | Prompt Tokens | Completion Tokens | Special Tokens |
|---|---|---|---|
| OpenAI | usage.prompt_tokens | usage.completion_tokens | completion_tokens_details.reasoning_tokens |
| Anthropic | usage.input_tokens | usage.output_tokens | cache_creation_input_tokens, cache_read_input_tokens |
usageMetadata.promptTokenCount | usageMetadata.candidatesTokenCount | — | |
| DeepSeek | usage.prompt_tokens | usage.completion_tokens | completion_tokens_details.reasoning_tokens |
For streaming responses, usage data is extracted from the final SSE chunk. OpenAI requires stream_options: {"include_usage": true} to report usage in streaming mode — the gateway adds this automatically.
Per-org aggregation
Cost data is aggregated at the organization level using Redis counters:
Daily cost tracking
- Key:
gateway:daily_cost:{org_id}:{YYYY-MM-DD} - TTL: 48 hours
- Operation:
INCRBYFLOATfor precise float accumulation - Used by: Governance chain (Step 2) for daily budget enforcement
Monthly cost tracking
- Key:
gateway:monthly_cost:{org_id}:{YYYY-MM} - TTL: 35 days
- Operation:
INCRBYFLOATfor precise float accumulation - Used by: Governance chain (Step 2) for monthly budget enforcement
Daily request count
- Key:
gateway:daily_requests:{org_id}:{YYYY-MM-DD} - TTL: 48 hours
- Operation:
INCR - Used by: Analytics and dashboard request volume charts
Budget alerts and caps
The gateway enforces budget limits at three levels:
Per-request cap
Each request’s estimated cost is checked against max_cost_per_request. This prevents a single expensive request (e.g., a large context window with GPT-5) from consuming a disproportionate share of the budget.
| Tier | Max Cost per Request |
|---|---|
| Free | $0.25 |
| Starter | $0.50 |
| Pro | $2.00 |
| Team | $5.00 |
| Enterprise | $10.00 |
Daily budget cap
The sum of daily_cost + estimated_cost is checked against daily_budget. When the cap is reached, all requests for the org are blocked until the next UTC day.
| Tier | Daily Budget |
|---|---|
| Free | $5 |
| Starter | $25 |
| Pro | $100 |
| Team | $500 |
| Enterprise | $2,000 |
Monthly budget cap
The sum of monthly_cost + estimated_cost is checked against monthly_budget. When the cap is reached, all requests for the org are blocked until the next calendar month.
| Tier | Monthly Budget |
|---|---|
| Free | $50 |
| Starter | $250 |
| Pro | $2,000 |
| Team | $10,000 |
| Enterprise | $50,000 |
Custom limits can be configured per organization via the dashboard or the B2B API.
Cost comparison across providers
The gateway records the provider and model for every request, making it straightforward to compare costs across providers for equivalent tasks. The dashboard’s cost governance panel includes:
- Cost by provider: Aggregate spend broken down by OpenAI, Anthropic, Google, Groq, Mistral, xAI, and more
- Cost by model: Per-model spend ranking to identify the most expensive models in use
- Cost trends: Daily and weekly cost charts with trend indicators
- Per-key attribution: Cost breakdown by API key, useful for tracking spend per application or team
Runner session costs
For requests originating from managed runner sessions, the gateway additionally tracks costs at the session level via the CostSLOEngine. This enables:
- Per-session cost limits (separate from org-level budgets)
- Session cost accumulation in Redis for real-time enforcement
- Cost attribution to specific runner executions in the audit log
Dashboard integration
The cost data recorded by the gateway feeds directly into the dashboard’s cost governance panel:
- Real-time spend: Current daily and monthly totals pulled from Redis
- Historical data: Full usage history queried from the MongoDB
gateway_usagecollection - Budget utilization: Visual indicators showing percentage of daily and monthly budgets consumed
- Cost alerts: Configurable notifications when spend approaches budget thresholds
Access the cost governance panel in the dashboard under Gateway > Cost Tracking.
Querying cost data
Via the CLI
# Daily cost snapshot
./scripts/analytics costs today
# Cost history for the last 7 days
./scripts/analytics costs history 7
# Full analytics snapshot
./scripts/analytics snapshot todayVia the B2B API
# Get cost summary for the current org
curl https://api.curate-me.ai/api/v1/admin/gateway/costs/summary \
-H "Authorization: Bearer $TOKEN" \
-H "X-Org-ID: $ORG_ID"
# Get usage records with filters
curl "https://api.curate-me.ai/api/v1/admin/gateway/usage?provider=openai&limit=50" \
-H "Authorization: Bearer $TOKEN" \
-H "X-Org-ID: $ORG_ID"Backend implementation
Key source files:
| File | Purpose |
|---|---|
src/gateway/cost_recorder.py | Post-response cost recording (Redis + MongoDB + Prometheus) |
src/gateway/model_pricing.py | Model pricing tables and cost calculation functions |
src/gateway/governance.py | Pre-request cost estimation and budget enforcement |
src/gateway/proxy.py | Token usage extraction from provider responses |
src/gateway/metrics.py | Prometheus metric emission for cost and latency |