Runbook: Rate Limit Exceeded

Owner: Platform Team Backup owner: On-call engineer Last validated: Not yet validated Validation method: Manual drill Severity trigger: SEV3 Customer impact: Affected org’s requests throttled; 429 responses until rate resets Required access: SSH (VPS), Redis Related services: curateme-backend-gateway

This runbook covers diagnosing and resolving rate limit errors from the Curate-Me AI Gateway.

Symptoms

429 Too Many Requests responses from the gateway
Response header X-RateLimit-Remaining: 0
Clients experiencing intermittent failures under load
Dashboard showing rate limit denials in the governance audit log

Typical error response:


{
  "error": {
    "message": "Rate limit exceeded: 61/60 requests per minute",
    "type": "rate_limit_error",
    "code": "429"
  }
}

Typical response headers:


X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710684120

Step 1: Read the response headers

Every gateway response includes rate limit headers, even on successful requests. These headers tell you the current state before you hit the limit:

Header	Description
`X-RateLimit-Limit`	Maximum requests per minute for your org
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds to wait before retrying (only on 429 responses)


# Check current rate limit state with a lightweight request
curl -v https://api.curate-me.ai/v1/health \
  -H "X-CM-API-Key: $API_KEY" \
  2>&1 | grep -i "x-ratelimit"

Step 2: Check per-org rate limit status

For a full view of rate limit state across all keys in the organization:


curl https://api.curate-me.ai/gateway/admin/rate-limits/status \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "X-Org-ID: $ORG_ID"

Expected response:


{
  "org_id": "org_abc123",
  "rpm_limit": 60,
  "current_minute_count": 58,
  "keys": [
    {"key_id": "key_001", "requests_this_minute": 45},
    {"key_id": "key_002", "requests_this_minute": 13}
  ]
}

Step 3: Identify the cause

Cause A: Burst traffic from a single client

One application or agent is making rapid sequential requests that exceed the per-minute limit.

Diagnosis: A single key_id accounts for most of the request count in the current minute.

Fix (immediate): Implement client-side exponential backoff using the Retry-After header:


import time
import requests
 
def call_gateway(payload):
    while True:
        response = requests.post(
            "https://api.curate-me.ai/v1/openai/chat/completions",
            json=payload,
            headers={
                "X-CM-API-Key": API_KEY,
                "Authorization": f"Bearer {OPENAI_KEY}",
            },
        )
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            time.sleep(retry_after)
            continue
        return response


async function callGateway(payload: object): Promise<Response> {
  while (true) {
    const response = await fetch(
      "https://api.curate-me.ai/v1/openai/chat/completions",
      {
        method: "POST",
        headers: {
          "X-CM-API-Key": API_KEY,
          "Authorization": `Bearer ${OPENAI_KEY}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify(payload),
      }
    );
    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get("Retry-After") ?? "5");
      await new Promise((r) => setTimeout(r, retryAfter * 1000));
      continue;
    }
    return response;
  }
}

Cause B: RPM limit too low for the workload

The organization’s plan tier has a default RPM limit that does not match the actual traffic pattern.

Default RPM limits by tier:

Plan	RPM Limit
Free	10
Starter	60
Growth	300
Team	1,000
Enterprise	5,000

Fix: Increase the RPM limit in the org’s governance policy:


curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"rate_limit_rpm": 300}'

Or upgrade the plan tier for a permanent increase:


# Direct the org to the billing page
# https://dashboard.curate-me.ai/settings/billing

Cause C: Single API key shared by many agents

Multiple agents or applications are using the same API key, causing their combined traffic to hit the per-org rate limit.

Diagnosis: The rate limit status endpoint shows a single key_id with a high request count, but the traffic comes from multiple independent processes.

Fix: Create separate API keys for each agent or application:


# Create a new API key for each agent
curl -X POST https://api.curate-me.ai/api/v1/admin/keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "agent-vision", "org_id": "org_abc123"}'

While rate limiting is per-org (not per-key), separate keys provide per-key cost attribution and make it easier to identify which agent is generating the most traffic. When per-key rate limits are available, separate keys will also allow independent rate limit pools.

Step 4: Verify resolution

After adjusting the rate limit or implementing backoff, confirm the issue is resolved:


# Check that the rate limit has been updated
curl https://api.curate-me.ai/gateway/admin/rate-limits/status \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "X-Org-ID: $ORG_ID"
 
# Send a burst of test requests and verify they succeed
for i in $(seq 1 10); do
  curl -s -o /dev/null -w "%{http_code}\n" \
    https://api.curate-me.ai/v1/openai/chat/completions \
    -H "X-CM-API-Key: $API_KEY" \
    -H "Authorization: Bearer $OPENAI_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "test"}], "max_tokens": 5}'
done

All responses should return 200. If any return 429, the RPM limit needs further adjustment.

Prevention

Use separate API keys per agent

Each agent or application should have its own API key. This provides clear cost attribution and simplifies debugging rate limit issues.

Implement client-side backoff

Every client that calls the gateway should respect the Retry-After header and implement exponential backoff. Never retry immediately on a 429.

Monitor rate limit utilization

Set up dashboard alerts to fire when rate limit utilization exceeds 80%:

Dashboard > Settings > Alerts > New Alert with condition rate_limit_utilization > 0.8.

Set appropriate tier limits

Review the organization’s traffic patterns weekly and adjust the RPM limit to provide at least 30% headroom above peak traffic.

Escalation

If rate limit issues persist after increasing RPM:

Collect the X-CM-Request-ID from a sample of 429 responses

Pull error context and health metrics:


./scripts/errors by-source gateway | grep "rate_limit"
./scripts/analytics health

Check whether the issue is per-org or per-key: GET /gateway/admin/rate-limits/status
Verify Redis is healthy (rate limit counters are stored in Redis): redis-cli ping
Contact the platform team with the org ID and the rate limit status output

Rollback

Revert the changes described in the Procedure section. If a configuration change was made, restore the previous value from the MongoDB audit log or Redis backup.

Verification

After applying the fix, verify:

The symptoms listed above are no longer present
No new errors in gateway logs: docker logs curateme-backend-gateway --tail=50
Health check passes: curl -s http://localhost:8002/health | jq .status

Redis Incident — rate limit counters are stored in Redis; Redis issues cause rate limiting to fail open
Governance Cascading Denials — rate limit denials can cascade with other governance policy denials
Budget Exceeded — rate limits and budgets are both quota-based governance mechanisms