Cost Tracking
The Curate-Me gateway records the cost of every LLM request in real time. Costs are accumulated in Redis for instant dashboards and budget enforcement, and persisted to MongoDB for long-term audit trails.
How costs are recorded
Per-request flow
- Before proxy: The governance chain estimates cost using tiktoken BPE token counting and the model’s pricing table. This estimate is used for budget checks.
- After proxy: Once the provider responds, the gateway reads the actual token counts
from the response (
usage.prompt_tokens,usage.completion_tokens) and calculates the real cost. For streaming responses, token counts are extracted from the final SSE chunk. - Record: The actual cost is written to:
- Redis — daily and monthly cost counters, per-org and per-key (atomic
INCRBYFLOAT) - MongoDB (
gateway_usagecollection) — full audit record with deduplication viarequest_id - Prometheus — request counter, latency histogram, cost gauge (for alerting)
- Redis — daily and monthly cost counters, per-org and per-key (atomic
- Billing: A metered billing event is recorded for Stripe usage-based billing.
- WebSocket: A real-time cost event is broadcast to connected dashboard clients.
Token types tracked
The gateway tracks all token categories for accurate cost calculation:
| Token Type | Description |
|---|---|
prompt_tokens | Input tokens (messages, system prompt, tools) |
completion_tokens | Output tokens (model response) |
reasoning_tokens | Reasoning/chain-of-thought tokens (o1, o3 models) |
thinking_tokens | Extended thinking tokens (Claude Sonnet/Opus) |
cache_creation_tokens | Tokens used to create prompt caches |
cache_read_tokens | Tokens read from prompt caches (discounted) |
Cost attribution
Every usage record includes attribution metadata so you can slice costs along multiple dimensions:
By model and provider
Each record includes the model, provider, and requested_model (the alias before
resolution). Query the usage API to get per-model cost breakdowns:
curl "https://api.curate-me.ai/gateway/admin/usage?days=7" \
-H "X-CM-API-Key: cm_sk_your_key"By API key
Each record includes the key_id of the API key that made the request. Per-key daily
and monthly costs are accumulated in Redis for real-time tracking.
curl "https://api.curate-me.ai/gateway/admin/usage/daily?days=30" \
-H "X-CM-API-Key: cm_sk_your_key"By fleet and runner
When requests originate from managed runner containers, the record includes runner_id,
session_id, fleet_id, and fleet_role. This enables cost attribution per agent in a
multi-agent fleet:
curl "https://api.curate-me.ai/gateway/admin/runners/costs" \
-H "X-CM-API-Key: cm_sk_your_key"By custom tags (X-CM-Tags header)
You can attach arbitrary key-value labels to any request for project-based, environment-based, or team-based cost allocation.
Sending tags with a request:
curl https://api.curate-me.ai/v1/chat/completions \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Authorization: Bearer sk-your-openai-key" \
-H "X-CM-Tags: project=onboarding,env=staging,team=growth" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'Tags are stored on the usage record and can be queried for cost attribution:
{
"request_id": "gw_a1b2c3d4",
"model": "gpt-4o",
"actual_cost": 0.0031,
"tags": {
"project": "onboarding",
"env": "staging",
"team": "growth"
}
}Common tagging strategies:
| Tag | Purpose | Example values |
|---|---|---|
project | Cost allocation by project | onboarding, search, chatbot |
env | Separate staging from production costs | production, staging, development |
team | Department-level cost tracking | engineering, marketing, support |
feature | Feature-level cost attribution | autocomplete, summarization |
customer | Per-customer cost tracking (for SaaS) | cust_abc123 |
Budget caps
Daily budget
Resets at midnight UTC. When cumulative daily spend plus the estimated cost of a new
request would exceed the limit, the request is blocked with a 403.
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"daily_budget": 50.00}'Monthly budget
Resets on the 1st of each month. Same blocking behavior as daily budget.
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"monthly_budget": 1000.00}'Per-request cap
Blocks individual requests whose estimated cost exceeds the threshold. Useful to prevent accidental large-context requests from blowing through budget:
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"max_cost_per_request": 2.00}'Per-key spend caps
Individual API keys can have their own daily and monthly spend limits, independent of the org-level budget. This is set when creating the key:
curl -X POST https://api.curate-me.ai/gateway/admin/keys \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"name": "staging-key",
"daily_spend_cap_usd": 10.00,
"monthly_spend_cap_usd": 200.00
}'Hierarchical budgets (Org > Team > Key)
Budget limits are enforced at three levels. The tightest limit at any level blocks:
Organization budget: $100/day
|
+-- Engineering team: $60/day
| +-- Key "prod-api": $30/day
| +-- Key "staging": $10/day
|
+-- Marketing team: $20/day
+-- Key "content-gen": $15/dayPer-session budget (managed runners)
For managed runner containers, you can set a per-session cost limit to prevent long-running agent sessions from consuming excessive budget:
curl -X PATCH https://api.curate-me.ai/gateway/admin/runners/runner_abc/config \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"session_budget_limit": 5.00}'Budget alerts
Webhook alerts
The gateway fires webhook events when budget thresholds are reached:
| Event | Trigger |
|---|---|
budget.warning | Daily spend reaches 80% of daily budget |
budget.exceeded | A request is blocked because budget is exhausted |
Configure webhooks:
curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-app.com/webhooks/curate-me",
"events": ["budget.warning", "budget.exceeded"],
"budget_warning_threshold": 0.8
}'Real-time dashboard
The dashboard at Gateway > Cost Tracking shows:
- Live daily spend gauge with budget remaining
- Per-model cost breakdown pie chart
- Per-key cost attribution table
- Cost trend chart (7/30/90 day)
- Top-cost requests list
Cost events are streamed over WebSocket so the dashboard updates in real time as requests flow through the gateway.
Querying costs
Daily cost breakdown
curl "https://api.curate-me.ai/gateway/admin/usage/daily?days=30" \
-H "X-CM-API-Key: cm_sk_your_key"Response:
{
"days": [
{
"date": "2026-03-17",
"total_cost": 42.15,
"total_requests": 1847,
"by_model": [
{"model": "gpt-4o", "cost": 28.30, "requests": 920},
{"model": "claude-sonnet-4", "cost": 8.50, "requests": 312},
{"model": "gpt-4o-mini", "cost": 5.35, "requests": 615}
]
}
]
}Single request cost
Every gateway response includes X-CM-Request-ID. Use it to look up the full usage record:
curl "https://api.curate-me.ai/gateway/admin/usage/gw_a1b2c3d4" \
-H "X-CM-API-Key: cm_sk_your_key"Response:
{
"request_id": "gw_a1b2c3d4",
"org_id": "org_abc123",
"key_id": "key_xyz789",
"provider": "openai",
"model": "gpt-4o",
"prompt_tokens": 1250,
"completion_tokens": 380,
"total_tokens": 1630,
"estimated_cost": 0.0034,
"actual_cost": 0.0031,
"latency_ms": 842.5,
"stream": false,
"tags": {"project": "onboarding", "env": "production"},
"created_at": "2026-03-17T14:32:01Z"
}Python SDK
from curate_me.gateway import CurateGateway
gw = CurateGateway(api_key="cm_sk_your_key")
admin = gw.admin()
# Daily costs for the last 30 days
costs = await admin.get_daily_costs(days=30)
# Usage records for the last 7 days
usage = await admin.get_usage(days=7, limit=100)
# Single usage record
record = await admin.get_usage_record("gw_a1b2c3d4")TypeScript SDK
import { CurateGateway } from '@curate-me/sdk';
const gw = new CurateGateway('cm_sk_your_key');
const admin = gw.admin();
// Daily costs for the last 30 days
const costs = await admin.getDailyCosts({ days: 30 });
// Usage records for the last 7 days
const usage = await admin.getUsage({ days: 7, limit: 100 });
// Single usage record
const record = await admin.getUsageRecord('gw_a1b2c3d4');Cost data retention
| Store | TTL | Purpose |
|---|---|---|
| Redis daily counter | 48 hours | Real-time budget enforcement |
| Redis monthly counter | 35 days | Monthly budget enforcement |
| Redis per-key counter | 48 hours (daily), 35 days (monthly) | Per-key spend caps |
MongoDB gateway_usage | Indefinite | Audit trail, historical queries |
Daily counters in Redis have a 48-hour TTL to cover UTC midnight boundaries. MongoDB records are never automatically deleted — they serve as the audit trail.
Next steps
- Governance Chain — how budget enforcement fits in the governance pipeline
- Gateway API Reference — usage and cost endpoints
- Runbook: Budget Exceeded — diagnosing cost spikes