Runbook: Budget Exceeded / Cost Spike

This runbook covers diagnosing and resolving budget-related denials and unexpected cost spikes through the Curate-Me AI Gateway.

Symptoms

403 responses with error code GW_COST_002, daily_budget, monthly_budget, or cost_per_request
Webhook alerts firing for budget_exceeded events
Dashboard cost charts showing an unexpected spike
Agents or applications suddenly unable to make LLM requests

Typical error response:


{
  "error": {
    "message": "Daily budget exhausted: $24.50 spent + $0.85 estimated > $25.00 limit",
    "type": "permission_error",
    "code": "daily_budget"
  }
}

Step 1: Check current spend

Pull the daily cost breakdown for the affected organization:


curl https://api.curate-me.ai/api/v1/admin/gateway/costs/daily \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "X-Org-ID: $ORG_ID"

Expected response:


{
  "org_id": "org_abc123",
  "date": "2026-03-17",
  "daily_spend": 24.50,
  "daily_budget": 25.00,
  "monthly_spend": 187.30,
  "monthly_budget": 250.00,
  "top_models": [
    {"model": "gpt-5.1", "cost": 18.20, "requests": 45},
    {"model": "claude-opus-4", "cost": 4.80, "requests": 12},
    {"model": "gpt-4o", "cost": 1.50, "requests": 230}
  ]
}

Also check the dashboard for a visual breakdown: Dashboard > Gateway > Cost Tracking > Cost Breakdown.

Step 2: Identify the cause

Cause A: Runaway agent loop

A single agent making repeated LLM calls in a tight loop is the most common cause of budget spikes.

Diagnosis: One model or one API key accounts for a disproportionate share of daily spend.


# Check per-key cost attribution
curl "https://api.curate-me.ai/api/v1/admin/gateway/usage?limit=50&sort=cost_desc" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "X-Org-ID: $ORG_ID"

Look for patterns:

Many requests in a short time window from the same key_id
Same prompt content repeated across requests (retry storm)
Requests with large completion_tokens counts (model generating verbose output)

Fix (immediate): Revoke or pause the offending API key:


curl -X POST https://api.curate-me.ai/api/v1/admin/keys/$KEY_ID/disable \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Cause B: Model upgrade without budget increase

Switching from a cheaper model (e.g., gpt-4o-mini at $0.15/1M input) to an expensive model (e.g., gpt-5.1 at $2.50/1M input) without adjusting the budget cap.

Diagnosis: The top_models field in the cost response shows a new expensive model that was not previously in use.

Fix: Increase the daily budget to accommodate the new model, or add a per-request cost cap:


curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_budget": 100.00,
    "max_cost_per_request": 2.00
  }'

Cause C: Fleet cost misconfiguration

For organizations running managed runner fleets, each runner session generates LLM costs. A fleet with many runners can burn through budget quickly if session-level cost caps are not set.

Diagnosis: Check runner session costs:


curl https://api.curate-me.ai/gateway/admin/runners/costs \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "X-Org-ID: $ORG_ID"

Look for runners with high per-session spend or many active sessions.

Fix: Set per-session cost limits for the fleet:


curl -X PATCH https://api.curate-me.ai/gateway/admin/runners/$RUNNER_ID/config \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"session_budget_limit": 5.00}'

Step 3: Unblock the organization (if needed)

If the organization is legitimately blocked and needs to resume operations before the budget resets at midnight UTC:

Option A: Increase the daily budget


curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"daily_budget": 50.00}'

Option B: Reset the daily cost counter

Use this only in emergencies — it resets the Redis daily cost counter for the org:


curl -X POST https://api.curate-me.ai/gateway/admin/costs/reset-daily \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"org_id": "org_abc123"}'

The MongoDB audit trail is not affected — only the real-time Redis counter is reset.

Step 4: Set up prevention

Add per-request cost caps

Per-request caps prevent any single expensive request from consuming a large portion of the budget:


curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"max_cost_per_request": 1.00}'

Configure webhook alerts

Set up webhook notifications to fire when spend reaches a threshold (e.g., 80% of daily budget):


curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.com/webhooks/curate-me",
    "events": ["budget_exceeded", "budget_warning"],
    "budget_warning_threshold": 0.8
  }'

Set HITL thresholds for expensive requests

The Human-in-the-Loop gate can catch high-cost requests before they execute:


curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"hitl_threshold": 5.00}'

Budget limits by tier (defaults)

Tier	Per-Request Max	Daily Budget	Monthly Budget
Free	$0.25	$5	$50
Starter	$0.50	$25	$250
Growth	$2.00	$100	$2,000
Enterprise	$10.00	$2,000	$50,000

Daily budgets reset at midnight UTC. Monthly budgets reset on the 1st of each month.

Escalation

If the cost spike cannot be explained by any of the above causes:

Collect the X-CM-Request-ID headers from recent requests
Export the full usage log for the time window: Dashboard > Gateway > Usage Log > Export CSV
Check for anomalous patterns (requests from unexpected IP addresses, unknown API keys)
Contact the platform team with the org ID, time window, and usage export