Skip to Content
RunbooksRunbook: Rate Limit Exceeded

Runbook: Rate Limit Exceeded

This runbook covers diagnosing and resolving rate limit errors from the Curate-Me AI Gateway.


Symptoms

  • 429 Too Many Requests responses from the gateway
  • Response header X-RateLimit-Remaining: 0
  • Clients experiencing intermittent failures under load
  • Dashboard showing rate limit denials in the governance audit log

Typical error response:

{ "error": { "message": "Rate limit exceeded: 61/60 requests per minute", "type": "rate_limit_error", "code": "429" } }

Typical response headers:

X-RateLimit-Limit: 60 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1710684120

Step 1: Read the response headers

Every gateway response includes rate limit headers, even on successful requests. These headers tell you the current state before you hit the limit:

HeaderDescription
X-RateLimit-LimitMaximum requests per minute for your org
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait before retrying (only on 429 responses)
# Check current rate limit state with a lightweight request curl -v https://api.curate-me.ai/v1/health \ -H "X-CM-API-Key: $API_KEY" \ 2>&1 | grep -i "x-ratelimit"

Step 2: Check per-org rate limit status

For a full view of rate limit state across all keys in the organization:

curl https://api.curate-me.ai/gateway/admin/rate-limits/status \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Expected response:

{ "org_id": "org_abc123", "rpm_limit": 60, "current_minute_count": 58, "keys": [ {"key_id": "key_001", "requests_this_minute": 45}, {"key_id": "key_002", "requests_this_minute": 13} ] }

Step 3: Identify the cause

Cause A: Burst traffic from a single client

One application or agent is making rapid sequential requests that exceed the per-minute limit.

Diagnosis: A single key_id accounts for most of the request count in the current minute.

Fix (immediate): Implement client-side exponential backoff using the Retry-After header:

import time import requests def call_gateway(payload): while True: response = requests.post( "https://api.curate-me.ai/v1/openai/chat/completions", json=payload, headers={ "X-CM-API-Key": API_KEY, "Authorization": f"Bearer {OPENAI_KEY}", }, ) if response.status_code == 429: retry_after = int(response.headers.get("Retry-After", 5)) time.sleep(retry_after) continue return response
async function callGateway(payload: object): Promise<Response> { while (true) { const response = await fetch( "https://api.curate-me.ai/v1/openai/chat/completions", { method: "POST", headers: { "X-CM-API-Key": API_KEY, "Authorization": `Bearer ${OPENAI_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify(payload), } ); if (response.status === 429) { const retryAfter = parseInt(response.headers.get("Retry-After") ?? "5"); await new Promise((r) => setTimeout(r, retryAfter * 1000)); continue; } return response; } }

Cause B: RPM limit too low for the workload

The organization’s plan tier has a default RPM limit that does not match the actual traffic pattern.

Default RPM limits by tier:

PlanRPM Limit
Free10
Starter60
Growth300
Team1,000
Enterprise5,000

Fix: Increase the RPM limit in the org’s governance policy:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"rate_limit_rpm": 300}'

Or upgrade the plan tier for a permanent increase:

# Direct the org to the billing page # https://dashboard.curate-me.ai/settings/billing

Cause C: Single API key shared by many agents

Multiple agents or applications are using the same API key, causing their combined traffic to hit the per-org rate limit.

Diagnosis: The rate limit status endpoint shows a single key_id with a high request count, but the traffic comes from multiple independent processes.

Fix: Create separate API keys for each agent or application:

# Create a new API key for each agent curl -X POST https://api.curate-me.ai/api/v1/admin/keys \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"name": "agent-vision", "org_id": "org_abc123"}'

While rate limiting is per-org (not per-key), separate keys provide per-key cost attribution and make it easier to identify which agent is generating the most traffic. When per-key rate limits are available, separate keys will also allow independent rate limit pools.


Step 4: Verify resolution

After adjusting the rate limit or implementing backoff, confirm the issue is resolved:

# Check that the rate limit has been updated curl https://api.curate-me.ai/gateway/admin/rate-limits/status \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID" # Send a burst of test requests and verify they succeed for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code}\n" \ https://api.curate-me.ai/v1/openai/chat/completions \ -H "X-CM-API-Key: $API_KEY" \ -H "Authorization: Bearer $OPENAI_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "test"}], "max_tokens": 5}' done

All responses should return 200. If any return 429, the RPM limit needs further adjustment.


Prevention

Use separate API keys per agent

Each agent or application should have its own API key. This provides clear cost attribution and simplifies debugging rate limit issues.

Implement client-side backoff

Every client that calls the gateway should respect the Retry-After header and implement exponential backoff. Never retry immediately on a 429.

Monitor rate limit utilization

Set up dashboard alerts to fire when rate limit utilization exceeds 80%:

Dashboard > Settings > Alerts > New Alert with condition rate_limit_utilization > 0.8.

Set appropriate tier limits

Review the organization’s traffic patterns weekly and adjust the RPM limit to provide at least 30% headroom above peak traffic.


Escalation

If rate limit issues persist after increasing RPM:

  1. Collect the X-CM-Request-ID from a sample of 429 responses
  2. Check whether the issue is per-org or per-key: GET /gateway/admin/rate-limits/status
  3. Verify Redis is healthy (rate limit counters are stored in Redis): redis-cli ping
  4. Contact the platform team with the org ID and the rate limit status output