429 Too Many Requests
Your request was blocked by the rate limiter before reaching the provider. No tokens were consumed.
Error body
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded: 60 requests/min for key cm_sk_xxx. 46 requests remaining after retry window.",
"request_id": "req_01hwz3kj4p5qm8n9v2t6ys",
"governance_stage": "rate_limit",
"retry_after": 14,
"limit": 60,
"remaining": 0,
"reset_at": "2026-05-25T14:32:14Z"
}
}Response headers
Every request (success or 429) includes IETF-standard rate limit headers:
| Header | Value | Meaning |
|---|---|---|
RateLimit-Limit | 60 | Max requests per minute for this key |
RateLimit-Remaining | 0 | Requests remaining in current window |
RateLimit-Reset | 14 | Seconds until the window resets |
Retry-After | 14 | Same as RateLimit-Reset for 429 responses |
Handling 429s in code
Python
import time
import httpx
from openai import OpenAI, RateLimitError
client = OpenAI(
base_url="https://api.curate-me.ai/v1/openai",
api_key="cm_sk_your_key",
max_retries=3, # OpenAI SDK retries 429s with backoff automatically
timeout=30.0,
)
# Manual retry with Retry-After header
def call_with_retry(messages, max_attempts=3):
for attempt in range(max_attempts):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
except RateLimitError as e:
if attempt == max_attempts - 1:
raise
retry_after = int(e.response.headers.get("Retry-After", 10))
print(f"Rate limited. Retrying in {retry_after}s...")
time.sleep(retry_after)Or use the Curate-Me Python SDK which has built-in retry with RetryPolicy:
from curate_me.gateway import CurateGateway, RetryPolicy
gw = CurateGateway(
api_key="cm_sk_your_key",
retry_policy=RetryPolicy(
max_retries=3,
initial_delay=1.0,
backoff_factor=2.0,
retryable_status_codes={429, 500, 502, 503, 504},
),
)
client = gw.openai()Raising the rate limit
Default rate limits by plan:
| Plan | Requests per minute |
|---|---|
| Free | 10 RPM |
| Starter | 60 RPM |
| Pro | 300 RPM |
| Enterprise | Custom |
To raise your limit:
# Check current limit
curl https://api.curate-me.ai/v1/admin/rate-limits \
-H "X-CM-API-Key: cm_sk_your_key"
# Request a higher limit (Starter+ only)
# Contact support@curate-me.ai or upgrade your planOr configure a per-key rate limit lower than the org limit (useful for restricting individual integrations):
curl -X PATCH https://api.curate-me.ai/v1/admin/api-keys/key_xxx \
-H "X-CM-API-Key: cm_sk_admin_key" \
-H "Content-Type: application/json" \
-d '{"rate_limit_rpm": 30}'Rate limit scope
Rate limits apply at two levels simultaneously:
- Per-key limit — requests from this specific
cm_sk_...key (configurable per key) - Per-org limit — total requests across all keys for the organization (plan-level)
A 429 on either level blocks the request. The error body specifies which limit was hit via the message field.
The OpenAI and Anthropic SDKs both retry 429 errors automatically with exponential backoff. If you’re using the SDK directly (not the CM Python SDK), you get retry logic for free.