Runbook: Budget Exceeded / Cost Spike
This runbook covers diagnosing and resolving budget-related denials and unexpected cost spikes through the Curate-Me AI Gateway.
Symptoms
403responses with error codeGW_COST_002,daily_budget,monthly_budget, orcost_per_request- Webhook alerts firing for
budget_exceededevents - Dashboard cost charts showing an unexpected spike
- Agents or applications suddenly unable to make LLM requests
Typical error response:
{
"error": {
"message": "Daily budget exhausted: $24.50 spent + $0.85 estimated > $25.00 limit",
"type": "permission_error",
"code": "daily_budget"
}
}Step 1: Check current spend
Pull the daily cost breakdown for the affected organization:
curl https://api.curate-me.ai/api/v1/admin/gateway/costs/daily \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "X-Org-ID: $ORG_ID"Expected response:
{
"org_id": "org_abc123",
"date": "2026-03-17",
"daily_spend": 24.50,
"daily_budget": 25.00,
"monthly_spend": 187.30,
"monthly_budget": 250.00,
"top_models": [
{"model": "gpt-5.1", "cost": 18.20, "requests": 45},
{"model": "claude-opus-4", "cost": 4.80, "requests": 12},
{"model": "gpt-4o", "cost": 1.50, "requests": 230}
]
}Also check the dashboard for a visual breakdown: Dashboard > Gateway > Cost Tracking > Cost Breakdown.
Step 2: Identify the cause
Cause A: Runaway agent loop
A single agent making repeated LLM calls in a tight loop is the most common cause of budget spikes.
Diagnosis: One model or one API key accounts for a disproportionate share of daily spend.
# Check per-key cost attribution
curl "https://api.curate-me.ai/api/v1/admin/gateway/usage?limit=50&sort=cost_desc" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "X-Org-ID: $ORG_ID"Look for patterns:
- Many requests in a short time window from the same
key_id - Same prompt content repeated across requests (retry storm)
- Requests with large
completion_tokenscounts (model generating verbose output)
Fix (immediate): Revoke or pause the offending API key:
curl -X POST https://api.curate-me.ai/api/v1/admin/keys/$KEY_ID/disable \
-H "Authorization: Bearer $ADMIN_TOKEN"Cause B: Model upgrade without budget increase
Switching from a cheaper model (e.g., gpt-4o-mini at $0.15/1M input) to an expensive model (e.g., gpt-5.1 at $2.50/1M input) without adjusting the budget cap.
Diagnosis: The top_models field in the cost response shows a new expensive model that was not previously in use.
Fix: Increase the daily budget to accommodate the new model, or add a per-request cost cap:
curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"daily_budget": 100.00,
"max_cost_per_request": 2.00
}'Cause C: Fleet cost misconfiguration
For organizations running managed runner fleets, each runner session generates LLM costs. A fleet with many runners can burn through budget quickly if session-level cost caps are not set.
Diagnosis: Check runner session costs:
curl https://api.curate-me.ai/gateway/admin/runners/costs \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "X-Org-ID: $ORG_ID"Look for runners with high per-session spend or many active sessions.
Fix: Set per-session cost limits for the fleet:
curl -X PATCH https://api.curate-me.ai/gateway/admin/runners/$RUNNER_ID/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"session_budget_limit": 5.00}'Step 3: Unblock the organization (if needed)
If the organization is legitimately blocked and needs to resume operations before the budget resets at midnight UTC:
Option A: Increase the daily budget
curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"daily_budget": 50.00}'Option B: Reset the daily cost counter
Use this only in emergencies — it resets the Redis daily cost counter for the org:
curl -X POST https://api.curate-me.ai/gateway/admin/costs/reset-daily \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"org_id": "org_abc123"}'The MongoDB audit trail is not affected — only the real-time Redis counter is reset.
Step 4: Set up prevention
Add per-request cost caps
Per-request caps prevent any single expensive request from consuming a large portion of the budget:
curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"max_cost_per_request": 1.00}'Configure webhook alerts
Set up webhook notifications to fire when spend reaches a threshold (e.g., 80% of daily budget):
curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-app.com/webhooks/curate-me",
"events": ["budget_exceeded", "budget_warning"],
"budget_warning_threshold": 0.8
}'Set HITL thresholds for expensive requests
The Human-in-the-Loop gate can catch high-cost requests before they execute:
curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"hitl_threshold": 5.00}'Budget limits by tier (defaults)
| Tier | Per-Request Max | Daily Budget | Monthly Budget |
|---|---|---|---|
| Free | $0.25 | $5 | $50 |
| Starter | $0.50 | $25 | $250 |
| Growth | $2.00 | $100 | $2,000 |
| Enterprise | $10.00 | $2,000 | $50,000 |
Daily budgets reset at midnight UTC. Monthly budgets reset on the 1st of each month.
Escalation
If the cost spike cannot be explained by any of the above causes:
- Collect the
X-CM-Request-IDheaders from recent requests - Export the full usage log for the time window: Dashboard > Gateway > Usage Log > Export CSV
- Check for anomalous patterns (requests from unexpected IP addresses, unknown API keys)
- Contact the platform team with the org ID, time window, and usage export