Skip to Content
RunbooksRunbook: Budget Exceeded / Cost Spike

Runbook: Budget Exceeded / Cost Spike

This runbook covers diagnosing and resolving budget-related denials and unexpected cost spikes through the Curate-Me AI Gateway.


Symptoms

  • 403 responses with error code GW_COST_002, daily_budget, monthly_budget, or cost_per_request
  • Webhook alerts firing for budget_exceeded events
  • Dashboard cost charts showing an unexpected spike
  • Agents or applications suddenly unable to make LLM requests

Typical error response:

{ "error": { "message": "Daily budget exhausted: $24.50 spent + $0.85 estimated > $25.00 limit", "type": "permission_error", "code": "daily_budget" } }

Step 1: Check current spend

Pull the daily cost breakdown for the affected organization:

curl https://api.curate-me.ai/api/v1/admin/gateway/costs/daily \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Expected response:

{ "org_id": "org_abc123", "date": "2026-03-17", "daily_spend": 24.50, "daily_budget": 25.00, "monthly_spend": 187.30, "monthly_budget": 250.00, "top_models": [ {"model": "gpt-5.1", "cost": 18.20, "requests": 45}, {"model": "claude-opus-4", "cost": 4.80, "requests": 12}, {"model": "gpt-4o", "cost": 1.50, "requests": 230} ] }

Also check the dashboard for a visual breakdown: Dashboard > Gateway > Cost Tracking > Cost Breakdown.


Step 2: Identify the cause

Cause A: Runaway agent loop

A single agent making repeated LLM calls in a tight loop is the most common cause of budget spikes.

Diagnosis: One model or one API key accounts for a disproportionate share of daily spend.

# Check per-key cost attribution curl "https://api.curate-me.ai/api/v1/admin/gateway/usage?limit=50&sort=cost_desc" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Look for patterns:

  • Many requests in a short time window from the same key_id
  • Same prompt content repeated across requests (retry storm)
  • Requests with large completion_tokens counts (model generating verbose output)

Fix (immediate): Revoke or pause the offending API key:

curl -X POST https://api.curate-me.ai/api/v1/admin/keys/$KEY_ID/disable \ -H "Authorization: Bearer $ADMIN_TOKEN"

Cause B: Model upgrade without budget increase

Switching from a cheaper model (e.g., gpt-4o-mini at $0.15/1M input) to an expensive model (e.g., gpt-5.1 at $2.50/1M input) without adjusting the budget cap.

Diagnosis: The top_models field in the cost response shows a new expensive model that was not previously in use.

Fix: Increase the daily budget to accommodate the new model, or add a per-request cost cap:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "daily_budget": 100.00, "max_cost_per_request": 2.00 }'

Cause C: Fleet cost misconfiguration

For organizations running managed runner fleets, each runner session generates LLM costs. A fleet with many runners can burn through budget quickly if session-level cost caps are not set.

Diagnosis: Check runner session costs:

curl https://api.curate-me.ai/gateway/admin/runners/costs \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Look for runners with high per-session spend or many active sessions.

Fix: Set per-session cost limits for the fleet:

curl -X PATCH https://api.curate-me.ai/gateway/admin/runners/$RUNNER_ID/config \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"session_budget_limit": 5.00}'

Step 3: Unblock the organization (if needed)

If the organization is legitimately blocked and needs to resume operations before the budget resets at midnight UTC:

Option A: Increase the daily budget

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"daily_budget": 50.00}'

Option B: Reset the daily cost counter

Use this only in emergencies — it resets the Redis daily cost counter for the org:

curl -X POST https://api.curate-me.ai/gateway/admin/costs/reset-daily \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"org_id": "org_abc123"}'

The MongoDB audit trail is not affected — only the real-time Redis counter is reset.


Step 4: Set up prevention

Add per-request cost caps

Per-request caps prevent any single expensive request from consuming a large portion of the budget:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"max_cost_per_request": 1.00}'

Configure webhook alerts

Set up webhook notifications to fire when spend reaches a threshold (e.g., 80% of daily budget):

curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-app.com/webhooks/curate-me", "events": ["budget_exceeded", "budget_warning"], "budget_warning_threshold": 0.8 }'

Set HITL thresholds for expensive requests

The Human-in-the-Loop gate can catch high-cost requests before they execute:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"hitl_threshold": 5.00}'

Budget limits by tier (defaults)

TierPer-Request MaxDaily BudgetMonthly Budget
Free$0.25$5$50
Starter$0.50$25$250
Growth$2.00$100$2,000
Enterprise$10.00$2,000$50,000

Daily budgets reset at midnight UTC. Monthly budgets reset on the 1st of each month.


Escalation

If the cost spike cannot be explained by any of the above causes:

  1. Collect the X-CM-Request-ID headers from recent requests
  2. Export the full usage log for the time window: Dashboard > Gateway > Usage Log > Export CSV
  3. Check for anomalous patterns (requests from unexpected IP addresses, unknown API keys)
  4. Contact the platform team with the org ID, time window, and usage export