Skip to Content
GuidesCost Tracking

Cost Tracking

The Curate-Me gateway records the cost of every LLM request in real time. Costs are accumulated in Redis for instant dashboards and budget enforcement, and persisted to MongoDB for long-term audit trails.

How costs are recorded

Per-request flow

  1. Before proxy: The governance chain estimates cost using tiktoken BPE token counting and the model’s pricing table. This estimate is used for budget checks.
  2. After proxy: Once the provider responds, the gateway reads the actual token counts from the response (usage.prompt_tokens, usage.completion_tokens) and calculates the real cost. For streaming responses, token counts are extracted from the final SSE chunk.
  3. Record: The actual cost is written to:
    • Redis — daily and monthly cost counters, per-org and per-key (atomic INCRBYFLOAT)
    • MongoDB (gateway_usage collection) — full audit record with deduplication via request_id
    • Prometheus — request counter, latency histogram, cost gauge (for alerting)
  4. Billing: A metered billing event is recorded for Stripe usage-based billing.
  5. WebSocket: A real-time cost event is broadcast to connected dashboard clients.

Token types tracked

The gateway tracks all token categories for accurate cost calculation:

Token TypeDescription
prompt_tokensInput tokens (messages, system prompt, tools)
completion_tokensOutput tokens (model response)
reasoning_tokensReasoning/chain-of-thought tokens (o1, o3 models)
thinking_tokensExtended thinking tokens (Claude Sonnet/Opus)
cache_creation_tokensTokens used to create prompt caches
cache_read_tokensTokens read from prompt caches (discounted)

Cost attribution

Every usage record includes attribution metadata so you can slice costs along multiple dimensions:

By model and provider

Each record includes the model, provider, and requested_model (the alias before resolution). Query the usage API to get per-model cost breakdowns:

curl "https://api.curate-me.ai/gateway/admin/usage?days=7" \ -H "X-CM-API-Key: cm_sk_your_key"

By API key

Each record includes the key_id of the API key that made the request. Per-key daily and monthly costs are accumulated in Redis for real-time tracking.

curl "https://api.curate-me.ai/gateway/admin/usage/daily?days=30" \ -H "X-CM-API-Key: cm_sk_your_key"

By fleet and runner

When requests originate from managed runner containers, the record includes runner_id, session_id, fleet_id, and fleet_role. This enables cost attribution per agent in a multi-agent fleet:

curl "https://api.curate-me.ai/gateway/admin/runners/costs" \ -H "X-CM-API-Key: cm_sk_your_key"

By custom tags (X-CM-Tags header)

You can attach arbitrary key-value labels to any request for project-based, environment-based, or team-based cost allocation.

Sending tags with a request:

curl https://api.curate-me.ai/v1/chat/completions \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Authorization: Bearer sk-your-openai-key" \ -H "X-CM-Tags: project=onboarding,env=staging,team=growth" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }'

Tags are stored on the usage record and can be queried for cost attribution:

{ "request_id": "gw_a1b2c3d4", "model": "gpt-4o", "actual_cost": 0.0031, "tags": { "project": "onboarding", "env": "staging", "team": "growth" } }

Common tagging strategies:

TagPurposeExample values
projectCost allocation by projectonboarding, search, chatbot
envSeparate staging from production costsproduction, staging, development
teamDepartment-level cost trackingengineering, marketing, support
featureFeature-level cost attributionautocomplete, summarization
customerPer-customer cost tracking (for SaaS)cust_abc123

Budget caps

Daily budget

Resets at midnight UTC. When cumulative daily spend plus the estimated cost of a new request would exceed the limit, the request is blocked with a 403.

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{"daily_budget": 50.00}'

Monthly budget

Resets on the 1st of each month. Same blocking behavior as daily budget.

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{"monthly_budget": 1000.00}'

Per-request cap

Blocks individual requests whose estimated cost exceeds the threshold. Useful to prevent accidental large-context requests from blowing through budget:

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{"max_cost_per_request": 2.00}'

Per-key spend caps

Individual API keys can have their own daily and monthly spend limits, independent of the org-level budget. This is set when creating the key:

curl -X POST https://api.curate-me.ai/gateway/admin/keys \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "name": "staging-key", "daily_spend_cap_usd": 10.00, "monthly_spend_cap_usd": 200.00 }'

Hierarchical budgets (Org > Team > Key)

Budget limits are enforced at three levels. The tightest limit at any level blocks:

Organization budget: $100/day | +-- Engineering team: $60/day | +-- Key "prod-api": $30/day | +-- Key "staging": $10/day | +-- Marketing team: $20/day +-- Key "content-gen": $15/day

Per-session budget (managed runners)

For managed runner containers, you can set a per-session cost limit to prevent long-running agent sessions from consuming excessive budget:

curl -X PATCH https://api.curate-me.ai/gateway/admin/runners/runner_abc/config \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{"session_budget_limit": 5.00}'

Budget alerts

Webhook alerts

The gateway fires webhook events when budget thresholds are reached:

EventTrigger
budget.warningDaily spend reaches 80% of daily budget
budget.exceededA request is blocked because budget is exhausted

Configure webhooks:

curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-app.com/webhooks/curate-me", "events": ["budget.warning", "budget.exceeded"], "budget_warning_threshold": 0.8 }'

Real-time dashboard

The dashboard at Gateway > Cost Tracking shows:

  • Live daily spend gauge with budget remaining
  • Per-model cost breakdown pie chart
  • Per-key cost attribution table
  • Cost trend chart (7/30/90 day)
  • Top-cost requests list

Cost events are streamed over WebSocket so the dashboard updates in real time as requests flow through the gateway.

Querying costs

Daily cost breakdown

curl "https://api.curate-me.ai/gateway/admin/usage/daily?days=30" \ -H "X-CM-API-Key: cm_sk_your_key"

Response:

{ "days": [ { "date": "2026-03-17", "total_cost": 42.15, "total_requests": 1847, "by_model": [ {"model": "gpt-4o", "cost": 28.30, "requests": 920}, {"model": "claude-sonnet-4", "cost": 8.50, "requests": 312}, {"model": "gpt-4o-mini", "cost": 5.35, "requests": 615} ] } ] }

Single request cost

Every gateway response includes X-CM-Request-ID. Use it to look up the full usage record:

curl "https://api.curate-me.ai/gateway/admin/usage/gw_a1b2c3d4" \ -H "X-CM-API-Key: cm_sk_your_key"

Response:

{ "request_id": "gw_a1b2c3d4", "org_id": "org_abc123", "key_id": "key_xyz789", "provider": "openai", "model": "gpt-4o", "prompt_tokens": 1250, "completion_tokens": 380, "total_tokens": 1630, "estimated_cost": 0.0034, "actual_cost": 0.0031, "latency_ms": 842.5, "stream": false, "tags": {"project": "onboarding", "env": "production"}, "created_at": "2026-03-17T14:32:01Z" }

Python SDK

from curate_me.gateway import CurateGateway gw = CurateGateway(api_key="cm_sk_your_key") admin = gw.admin() # Daily costs for the last 30 days costs = await admin.get_daily_costs(days=30) # Usage records for the last 7 days usage = await admin.get_usage(days=7, limit=100) # Single usage record record = await admin.get_usage_record("gw_a1b2c3d4")

TypeScript SDK

import { CurateGateway } from '@curate-me/sdk'; const gw = new CurateGateway('cm_sk_your_key'); const admin = gw.admin(); // Daily costs for the last 30 days const costs = await admin.getDailyCosts({ days: 30 }); // Usage records for the last 7 days const usage = await admin.getUsage({ days: 7, limit: 100 }); // Single usage record const record = await admin.getUsageRecord('gw_a1b2c3d4');

Cost data retention

StoreTTLPurpose
Redis daily counter48 hoursReal-time budget enforcement
Redis monthly counter35 daysMonthly budget enforcement
Redis per-key counter48 hours (daily), 35 days (monthly)Per-key spend caps
MongoDB gateway_usageIndefiniteAudit trail, historical queries

Daily counters in Redis have a 48-hour TTL to cover UTC midnight boundaries. MongoDB records are never automatically deleted — they serve as the audit trail.

Next steps