Runbook: Rate Limit Exceeded
This runbook covers diagnosing and resolving rate limit errors from the Curate-Me AI Gateway.
Symptoms
429 Too Many Requestsresponses from the gateway- Response header
X-RateLimit-Remaining: 0 - Clients experiencing intermittent failures under load
- Dashboard showing rate limit denials in the governance audit log
Typical error response:
{
"error": {
"message": "Rate limit exceeded: 61/60 requests per minute",
"type": "rate_limit_error",
"code": "429"
}
}Typical response headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710684120Step 1: Read the response headers
Every gateway response includes rate limit headers, even on successful requests. These headers tell you the current state before you hit the limit:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per minute for your org |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds to wait before retrying (only on 429 responses) |
# Check current rate limit state with a lightweight request
curl -v https://api.curate-me.ai/v1/health \
-H "X-CM-API-Key: $API_KEY" \
2>&1 | grep -i "x-ratelimit"Step 2: Check per-org rate limit status
For a full view of rate limit state across all keys in the organization:
curl https://api.curate-me.ai/gateway/admin/rate-limits/status \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "X-Org-ID: $ORG_ID"Expected response:
{
"org_id": "org_abc123",
"rpm_limit": 60,
"current_minute_count": 58,
"keys": [
{"key_id": "key_001", "requests_this_minute": 45},
{"key_id": "key_002", "requests_this_minute": 13}
]
}Step 3: Identify the cause
Cause A: Burst traffic from a single client
One application or agent is making rapid sequential requests that exceed the per-minute limit.
Diagnosis: A single key_id accounts for most of the request count in the current minute.
Fix (immediate): Implement client-side exponential backoff using the Retry-After header:
import time
import requests
def call_gateway(payload):
while True:
response = requests.post(
"https://api.curate-me.ai/v1/openai/chat/completions",
json=payload,
headers={
"X-CM-API-Key": API_KEY,
"Authorization": f"Bearer {OPENAI_KEY}",
},
)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
time.sleep(retry_after)
continue
return responseasync function callGateway(payload: object): Promise<Response> {
while (true) {
const response = await fetch(
"https://api.curate-me.ai/v1/openai/chat/completions",
{
method: "POST",
headers: {
"X-CM-API-Key": API_KEY,
"Authorization": `Bearer ${OPENAI_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(payload),
}
);
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get("Retry-After") ?? "5");
await new Promise((r) => setTimeout(r, retryAfter * 1000));
continue;
}
return response;
}
}Cause B: RPM limit too low for the workload
The organization’s plan tier has a default RPM limit that does not match the actual traffic pattern.
Default RPM limits by tier:
| Plan | RPM Limit |
|---|---|
| Free | 10 |
| Starter | 60 |
| Growth | 300 |
| Team | 1,000 |
| Enterprise | 5,000 |
Fix: Increase the RPM limit in the org’s governance policy:
curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"rate_limit_rpm": 300}'Or upgrade the plan tier for a permanent increase:
# Direct the org to the billing page
# https://dashboard.curate-me.ai/settings/billingCause C: Single API key shared by many agents
Multiple agents or applications are using the same API key, causing their combined traffic to hit the per-org rate limit.
Diagnosis: The rate limit status endpoint shows a single key_id with a high request count, but the traffic comes from multiple independent processes.
Fix: Create separate API keys for each agent or application:
# Create a new API key for each agent
curl -X POST https://api.curate-me.ai/api/v1/admin/keys \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "agent-vision", "org_id": "org_abc123"}'While rate limiting is per-org (not per-key), separate keys provide per-key cost attribution and make it easier to identify which agent is generating the most traffic. When per-key rate limits are available, separate keys will also allow independent rate limit pools.
Step 4: Verify resolution
After adjusting the rate limit or implementing backoff, confirm the issue is resolved:
# Check that the rate limit has been updated
curl https://api.curate-me.ai/gateway/admin/rate-limits/status \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "X-Org-ID: $ORG_ID"
# Send a burst of test requests and verify they succeed
for i in $(seq 1 10); do
curl -s -o /dev/null -w "%{http_code}\n" \
https://api.curate-me.ai/v1/openai/chat/completions \
-H "X-CM-API-Key: $API_KEY" \
-H "Authorization: Bearer $OPENAI_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "test"}], "max_tokens": 5}'
doneAll responses should return 200. If any return 429, the RPM limit needs further adjustment.
Prevention
Use separate API keys per agent
Each agent or application should have its own API key. This provides clear cost attribution and simplifies debugging rate limit issues.
Implement client-side backoff
Every client that calls the gateway should respect the Retry-After header and implement exponential backoff. Never retry immediately on a 429.
Monitor rate limit utilization
Set up dashboard alerts to fire when rate limit utilization exceeds 80%:
Dashboard > Settings > Alerts > New Alert with condition rate_limit_utilization > 0.8.
Set appropriate tier limits
Review the organization’s traffic patterns weekly and adjust the RPM limit to provide at least 30% headroom above peak traffic.
Escalation
If rate limit issues persist after increasing RPM:
- Collect the
X-CM-Request-IDfrom a sample of 429 responses - Check whether the issue is per-org or per-key:
GET /gateway/admin/rate-limits/status - Verify Redis is healthy (rate limit counters are stored in Redis):
redis-cli ping - Contact the platform team with the org ID and the rate limit status output