Cost Tracking

Canonical reference: The comprehensive gateway cost tracking documentation is at Gateway Cost Tracking. This guide covers cost attribution, budget caps, and querying from a developer workflow perspective. For the SDK CostTracker class, see SDK Cost Tracking. For the dashboard UI, see Dashboard Cost Tracking.

The Curate-Me gateway records the cost of every LLM request in real time. Costs are accumulated in Redis for instant dashboards and budget enforcement, and persisted to MongoDB for long-term audit trails.

How costs are recorded

Per-request flow

Before proxy: The governance chain estimates cost using tiktoken BPE token counting and the model’s pricing table. This estimate is used for budget checks.
After proxy: Once the provider responds, the gateway reads the actual token counts from the response (usage.prompt_tokens, usage.completion_tokens) and calculates the real cost. For streaming responses, token counts are extracted from the final SSE chunk.
Record: The actual cost is written to:
- Redis — daily and monthly cost counters, per-org and per-key (atomic INCRBYFLOAT)
- MongoDB (gateway_usage collection) — full audit record with deduplication via request_id
- Prometheus — request counter, latency histogram, cost gauge (for alerting)
Billing: A metered billing event is recorded for Stripe usage-based billing.
WebSocket: A real-time cost event is broadcast to connected dashboard clients.

Token types tracked

The gateway tracks all token categories for accurate cost calculation:

Token Type	Description
`prompt_tokens`	Input tokens (messages, system prompt, tools)
`completion_tokens`	Output tokens (model response)
`reasoning_tokens`	Reasoning/chain-of-thought tokens (o1, o3 models)
`thinking_tokens`	Extended thinking tokens (Claude Sonnet/Opus)
`cache_creation_tokens`	Tokens used to create prompt caches
`cache_read_tokens`	Tokens read from prompt caches (discounted)

Cost attribution

Every usage record includes attribution metadata so you can slice costs along multiple dimensions:

By model and provider

Each record includes the model, provider, and requested_model (the alias before resolution). Query the usage API to get per-model cost breakdowns:


curl "https://api.curate-me.ai/gateway/admin/usage?days=7" \
  -H "X-CM-API-Key: cm_sk_your_key"

By API key

Each record includes the key_id of the API key that made the request. Per-key daily and monthly costs are accumulated in Redis for real-time tracking.


curl "https://api.curate-me.ai/gateway/admin/usage/daily?days=30" \
  -H "X-CM-API-Key: cm_sk_your_key"

By fleet and runner

When requests originate from managed runner containers, the record includes runner_id, session_id, fleet_id, and fleet_role. This enables cost attribution per agent in a multi-agent fleet:


curl "https://api.curate-me.ai/gateway/admin/runners/costs" \
  -H "X-CM-API-Key: cm_sk_your_key"

By custom tags (`X-CM-Tags` header)

You can attach arbitrary key-value labels to any request for project-based, environment-based, or team-based cost allocation.

Sending tags with a request:


curl https://api.curate-me.ai/v1/chat/completions \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Authorization: Bearer sk-your-openai-key" \
  -H "X-CM-Tags: project=onboarding,env=staging,team=growth" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Tags are stored on the usage record and can be queried for cost attribution:


{
  "request_id": "gw_a1b2c3d4",
  "model": "gpt-4o",
  "actual_cost": 0.0031,
  "tags": {
    "project": "onboarding",
    "env": "staging",
    "team": "growth"
  }
}

Common tagging strategies:

Tag	Purpose	Example values
`project`	Cost allocation by project	`onboarding`, `search`, `chatbot`
`env`	Separate staging from production costs	`production`, `staging`, `development`
`team`	Department-level cost tracking	`engineering`, `marketing`, `support`
`feature`	Feature-level cost attribution	`autocomplete`, `summarization`
`customer`	Per-customer cost tracking (for SaaS)	`cust_abc123`

Budget caps

Daily budget

Resets at midnight UTC. When cumulative daily spend plus the estimated cost of a new request would exceed the limit, the request is blocked with a 403.


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"daily_budget": 50.00}'

Monthly budget

Resets on the 1st of each month. Same blocking behavior as daily budget.


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"monthly_budget": 1000.00}'

Per-request cap

Blocks individual requests whose estimated cost exceeds the threshold. Useful to prevent accidental large-context requests from blowing through budget:


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"max_cost_per_request": 2.00}'

Per-key spend caps

Individual API keys can have their own daily and monthly spend limits, independent of the org-level budget. This is set when creating the key:


curl -X POST https://api.curate-me.ai/gateway/admin/keys \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "staging-key",
    "daily_spend_cap_usd": 10.00,
    "monthly_spend_cap_usd": 200.00
  }'

Hierarchical budgets (Org > Team > Key)

Budget limits are enforced at three levels. The tightest limit at any level blocks:


Organization budget: $100/day
  |
  +-- Engineering team: $60/day
  |     +-- Key "prod-api": $30/day
  |     +-- Key "staging": $10/day
  |
  +-- Marketing team: $20/day
        +-- Key "content-gen": $15/day

Per-session budget (managed runners)

For managed runner containers, you can set a per-session cost limit to prevent long-running agent sessions from consuming excessive budget:


curl -X PATCH https://api.curate-me.ai/gateway/admin/runners/runner_abc/config \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"session_budget_limit": 5.00}'

Budget alerts

Webhook alerts

The gateway fires webhook events when budget thresholds are reached:

Event	Trigger
`budget.warning`	Daily spend reaches 80% of daily budget
`budget.exceeded`	A request is blocked because budget is exhausted

Configure webhooks:


curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.com/webhooks/curate-me",
    "events": ["budget.warning", "budget.exceeded"],
    "budget_warning_threshold": 0.8
  }'

Real-time dashboard

The dashboard at Gateway > Cost Tracking shows:

Live daily spend gauge with budget remaining
Per-model cost breakdown pie chart
Per-key cost attribution table
Cost trend chart (7/30/90 day)
Top-cost requests list

Cost events are streamed over WebSocket so the dashboard updates in real time as requests flow through the gateway.

Querying costs

Daily cost breakdown


curl "https://api.curate-me.ai/gateway/admin/usage/daily?days=30" \
  -H "X-CM-API-Key: cm_sk_your_key"

Response:


{
  "days": [
    {
      "date": "2026-03-17",
      "total_cost": 42.15,
      "total_requests": 1847,
      "by_model": [
        {"model": "gpt-4o", "cost": 28.30, "requests": 920},
        {"model": "claude-sonnet-4", "cost": 8.50, "requests": 312},
        {"model": "gpt-4o-mini", "cost": 5.35, "requests": 615}
      ]
    }
  ]
}

Single request cost

Every gateway response includes X-CM-Request-ID. Use it to look up the full usage record:


curl "https://api.curate-me.ai/gateway/admin/usage/gw_a1b2c3d4" \
  -H "X-CM-API-Key: cm_sk_your_key"