Skip to Content
RunbooksRunbook: Budget Exceeded / Cost Spike

Runbook: Budget Exceeded / Cost Spike

Owner: Platform Team Backup owner: On-call engineer Last validated: Not yet validated Validation method: Manual drill Severity trigger: SEV3 Customer impact: Org’s LLM requests blocked until budget reset or limit raised Required access: SSH (VPS), MongoDB, Redis Related services: curateme-backend-gateway


This runbook covers diagnosing and resolving budget-related denials and unexpected cost spikes through the Curate-Me AI Gateway.


Symptoms

  • 403 responses with error code GW_COST_002, daily_budget, monthly_budget, or cost_per_request
  • Webhook alerts firing for budget_exceeded events
  • Dashboard cost charts showing an unexpected spike
  • Agents or applications suddenly unable to make LLM requests

Typical error response:

{ "error": { "message": "Daily budget exhausted: $24.50 spent + $0.85 estimated > $25.00 limit", "type": "permission_error", "code": "daily_budget" } }

Step 1: Check current spend

Pull the daily cost breakdown for the affected organization:

curl https://api.curate-me.ai/api/v1/admin/gateway/costs/daily \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Expected response:

{ "org_id": "org_abc123", "date": "2026-03-17", "daily_spend": 24.50, "daily_budget": 25.00, "monthly_spend": 187.30, "monthly_budget": 250.00, "top_models": [ {"model": "gpt-5.1", "cost": 18.20, "requests": 45}, {"model": "claude-opus-4", "cost": 4.80, "requests": 12}, {"model": "gpt-4o", "cost": 1.50, "requests": 230} ] }

Also check the dashboard for a visual breakdown: Dashboard > Gateway > Cost Tracking > Cost Breakdown.


Step 2: Identify the cause

Cause A: Runaway agent loop

A single agent making repeated LLM calls in a tight loop is the most common cause of budget spikes.

Diagnosis: One model or one API key accounts for a disproportionate share of daily spend.

# Check per-key cost attribution curl "https://api.curate-me.ai/api/v1/admin/gateway/usage?limit=50&sort=cost_desc" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Look for patterns:

  • Many requests in a short time window from the same key_id
  • Same prompt content repeated across requests (retry storm)
  • Requests with large completion_tokens counts (model generating verbose output)

Fix (immediate): Revoke or pause the offending API key:

curl -X POST https://api.curate-me.ai/api/v1/admin/keys/$KEY_ID/disable \ -H "Authorization: Bearer $ADMIN_TOKEN"

Cause B: Model upgrade without budget increase

Switching from a cheaper model (e.g., gpt-4o-mini at $0.15/1M input) to an expensive model (e.g., gpt-5.1 at $2.50/1M input) without adjusting the budget cap.

Diagnosis: The top_models field in the cost response shows a new expensive model that was not previously in use.

Fix: Increase the daily budget to accommodate the new model, or add a per-request cost cap:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "daily_budget": 100.00, "max_cost_per_request": 2.00 }'

Cause C: Fleet cost misconfiguration

For organizations running managed runner fleets (private beta), each runner session generates LLM costs. A fleet with many runners can burn through budget quickly if session-level cost caps are not set.

Diagnosis: Check runner session costs:

curl https://api.curate-me.ai/gateway/admin/runners/costs \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Look for runners with high per-session spend or many active sessions.

Fix: Set per-session cost limits for the fleet:

curl -X PATCH https://api.curate-me.ai/gateway/admin/runners/$RUNNER_ID/config \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"session_budget_limit": 5.00}'

Step 3: Unblock the organization (if needed)

If the organization is legitimately blocked and needs to resume operations before the budget resets at midnight UTC:

Option A: Increase the daily budget

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"daily_budget": 50.00}'

Option B: Reset the daily cost counter

Use this only in emergencies — it resets the Redis daily cost counter for the org:

curl -X POST https://api.curate-me.ai/gateway/admin/costs/reset-daily \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"org_id": "org_abc123"}'

The MongoDB audit trail is not affected — only the real-time Redis counter is reset.


Step 4: Set up prevention

Add per-request cost caps

Per-request caps prevent any single expensive request from consuming a large portion of the budget:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"max_cost_per_request": 1.00}'

Configure webhook alerts

Set up webhook notifications to fire when spend reaches a threshold (e.g., 80% of daily budget):

curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-app.com/webhooks/curate-me", "events": ["budget_exceeded", "budget_warning"], "budget_warning_threshold": 0.8 }'

Set HITL thresholds for expensive requests

The Human-in-the-Loop gate can catch high-cost requests before they execute:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"hitl_threshold": 5.00}'

Budget limits by tier (defaults)

TierPer-Request MaxDaily BudgetMonthly Budget
Free$0.25$5$50
Starter$0.50$25$250
Growth$2.00$100$2,000
Enterprise$10.00$2,000$50,000

Daily budgets reset at midnight UTC. Monthly budgets reset on the 1st of each month.


Escalation

If the cost spike cannot be explained by any of the above causes:

  1. Collect the X-CM-Request-ID headers from recent requests
  2. Pull cost breakdown and error context:
    ./scripts/analytics costs today ./scripts/errors by-source gateway | grep "budget\|cost"
  3. Export the full usage log for the time window: Dashboard > Gateway > Usage Log > Export CSV
  4. Check for anomalous patterns (requests from unexpected IP addresses, unknown API keys)
  5. Contact the platform team with the org ID, time window, and usage export

Rollback

Revert the changes described in the Procedure section. If a configuration change was made, restore the previous value from the MongoDB audit log or Redis backup.

Verification

After applying the fix, verify:

  • The symptoms listed above are no longer present
  • No new errors in gateway logs: docker logs curateme-backend-gateway --tail=50
  • Health check passes: curl -s http://localhost:8002/health | jq .status