429 Too Many Requests

Your request was blocked by the rate limiter before reaching the provider. No tokens were consumed.

Error body


{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded: 60 requests/min for key cm_sk_xxx. 46 requests remaining after retry window.",
    "request_id": "req_01hwz3kj4p5qm8n9v2t6ys",
    "governance_stage": "rate_limit",
    "retry_after": 14,
    "limit": 60,
    "remaining": 0,
    "reset_at": "2026-05-25T14:32:14Z"
  }
}

Response headers

Every request (success or 429) includes IETF-standard rate limit headers:

Header	Value	Meaning
`RateLimit-Limit`	`60`	Max requests per minute for this key
`RateLimit-Remaining`	`0`	Requests remaining in current window
`RateLimit-Reset`	`14`	Seconds until the window resets
`Retry-After`	`14`	Same as `RateLimit-Reset` for 429 responses

Handling 429s in code

Python


import time
import httpx
from openai import OpenAI, RateLimitError
 
client = OpenAI(
    base_url="https://api.curate-me.ai/v1/openai",
    api_key="cm_sk_your_key",
    max_retries=3,       # OpenAI SDK retries 429s with backoff automatically
    timeout=30.0,
)
 
# Manual retry with Retry-After header
def call_with_retry(messages, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
        except RateLimitError as e:
            if attempt == max_attempts - 1:
                raise
            retry_after = int(e.response.headers.get("Retry-After", 10))
            print(f"Rate limited. Retrying in {retry_after}s...")
            time.sleep(retry_after)

Or use the Curate-Me Python SDK which has built-in retry with RetryPolicy:


from curate_me.gateway import CurateGateway, RetryPolicy
 
gw = CurateGateway(
    api_key="cm_sk_your_key",
    retry_policy=RetryPolicy(
        max_retries=3,
        initial_delay=1.0,
        backoff_factor=2.0,
        retryable_status_codes={429, 500, 502, 503, 504},
    ),
)
client = gw.openai()

TypeScript


import OpenAI from 'openai';
 
const client = new OpenAI({
  baseURL: 'https://api.curate-me.ai/v1/openai',
  apiKey: process.env.CM_API_KEY,
  maxRetries: 3,  // SDK retries 429 automatically
});
 
// Manual retry with Retry-After header
async function callWithRetry(messages: OpenAI.Chat.MessageParam[], maxAttempts = 3) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await client.chat.completions.create({
        model: 'gpt-4o',
        messages,
      });
    } catch (err) {
      if (err instanceof OpenAI.RateLimitError && attempt < maxAttempts - 1) {
        const retryAfter = parseInt(err.headers?.['retry-after'] ?? '10', 10);
        console.log(`Rate limited. Retrying in ${retryAfter}s...`);
        await new Promise(r => setTimeout(r, retryAfter * 1000));
      } else {
        throw err;
      }
    }
  }
}

curl (with retry loop)


#!/usr/bin/env bash
# Simple retry loop honouring Retry-After
MAX_ATTEMPTS=3
ATTEMPT=0
 
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
  RESPONSE=$(curl -si https://api.curate-me.ai/v1/openai/chat/completions \
    -H "X-CM-API-Key: $CM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}')
 
  STATUS=$(echo "$RESPONSE" | head -1 | awk '{print $2}')
 
  if [ "$STATUS" = "429" ]; then
    RETRY_AFTER=$(echo "$RESPONSE" | grep -i "retry-after:" | awk '{print $2}' | tr -d '\r')
    RETRY_AFTER=${RETRY_AFTER:-10}
    echo "Rate limited. Retrying in ${RETRY_AFTER}s..."
    sleep "$RETRY_AFTER"
    ATTEMPT=$((ATTEMPT + 1))
  else
    echo "$RESPONSE" | tail -1
    break
  fi
done

Raising the rate limit

Default rate limits by plan:

Plan	Requests per minute
Free	10 RPM
Starter	60 RPM
Pro	300 RPM
Enterprise	Custom

To raise your limit:


# Check current limit
curl https://api.curate-me.ai/v1/admin/rate-limits \
  -H "X-CM-API-Key: cm_sk_your_key"
 
# Request a higher limit (Starter+ only)
# Contact support@curate-me.ai or upgrade your plan

Or configure a per-key rate limit lower than the org limit (useful for restricting individual integrations):


curl -X PATCH https://api.curate-me.ai/v1/admin/api-keys/key_xxx \
  -H "X-CM-API-Key: cm_sk_admin_key" \
  -H "Content-Type: application/json" \
  -d '{"rate_limit_rpm": 30}'

Rate limit scope

Rate limits apply at two levels simultaneously:

Per-key limit — requests from this specific cm_sk_... key (configurable per key)
Per-org limit — total requests across all keys for the organization (plan-level)

A 429 on either level blocks the request. The error body specifies which limit was hit via the message field.

The OpenAI and Anthropic SDKs both retry 429 errors automatically with exponential backoff. If you’re using the SDK directly (not the CM Python SDK), you get retry logic for free.