Error Handling & Retry Strategy
The Curate-Me AI Gateway sits between your application and upstream LLM providers. Errors can originate from two sources: the governance chain (policy denials, budget limits, rate limits) or the upstream provider (timeouts, outages, model errors). Understanding which errors are retryable and which are permanent policy decisions is critical for building reliable integrations.
Error Categories
| HTTP Status | Source | Retryable? | Meaning |
|---|---|---|---|
| 400 | Gateway | No | Malformed request (missing fields, bad JSON) |
| 402 | Governance | No | Budget exceeded (daily, monthly, or per-request cap) |
| 403 | Governance | No | Model not in allowlist or security violation detected |
| 413 | Governance | No | Request body exceeds size limit |
| 422 | Governance | No | Reasoning token cap exceeded or PII detected in content |
| 429 | Governance | Yes (with backoff) | Rate limit hit for org/key |
| 502 | Upstream | Yes | Provider returned an error |
| 503 | Upstream | Yes | Provider temporarily unavailable |
| 504 | Upstream | Yes | Provider request timed out |
Retry Strategy
Which errors to retry
- Retry: 429, 502, 503, 504
- Never retry: 400, 402, 403, 413, 422 — these are policy decisions or malformed requests. Retrying will produce the same result.
Exponential backoff with jitter
Use the formula:
delay = min(base * 2^attempt + random_jitter, max_delay)Recommended defaults:
| Parameter | Value |
|---|---|
base | 1 second |
max_delay | 30 seconds |
max_retries | 3 |
jitter | 0 to 1 second (uniform random) |
Respect rate limit headers
On 429 responses, the gateway includes:
Retry-After— seconds to wait before the next requestRateLimit-Reset— Unix timestamp when the rate limit window resetsRateLimit-Remaining— requests remaining in the current window (will be0)
Always prefer the Retry-After value over your own backoff calculation when it is present.
Python Example
Using httpx with retry logic:
import httpx
import time
import random
RETRYABLE = {429, 502, 503, 504}
NON_RETRYABLE = {400, 402, 403, 413, 422}
def call_gateway(payload: dict, max_retries: int = 3) -> httpx.Response:
"""Call the Curate-Me gateway with automatic retry for transient errors."""
url = "https://api.curate-me.ai/v1/openai/chat/completions"
headers = {"X-CM-API-Key": "cm_sk_xxx"}
for attempt in range(max_retries + 1):
response = httpx.post(url, headers=headers, json=payload, timeout=60)
if response.status_code < 400:
return response
if response.status_code in NON_RETRYABLE:
# Policy denial or bad request -- do not retry
raise Exception(
f"Non-retryable error {response.status_code}: {response.text}"
)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
time.sleep(retry_after + random.uniform(0, 1))
continue
if response.status_code >= 500:
delay = min(2 ** attempt + random.uniform(0, 1), 30)
time.sleep(delay)
continue
return response
raise Exception(f"Max retries ({max_retries}) exceeded")TypeScript Example
Using fetch with async retry:
const RETRYABLE = new Set([429, 502, 503, 504]);
const NON_RETRYABLE = new Set([400, 402, 403, 413, 422]);
async function callGateway(
payload: Record<string, unknown>,
maxRetries = 3,
): Promise<Response> {
const url = "https://api.curate-me.ai/v1/openai/chat/completions";
const headers = {
"Content-Type": "application/json",
"X-CM-API-Key": "cm_sk_xxx",
};
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, {
method: "POST",
headers,
body: JSON.stringify(payload),
});
if (response.ok) return response;
if (NON_RETRYABLE.has(response.status)) {
const body = await response.text();
throw new Error(`Non-retryable error ${response.status}: ${body}`);
}
if (response.status === 429) {
const retryAfter = parseInt(
response.headers.get("Retry-After") ?? String(2 ** attempt),
10,
);
await sleep(retryAfter * 1000 + Math.random() * 1000);
continue;
}
if (response.status >= 500) {
const delay = Math.min(2 ** attempt + Math.random(), 30);
await sleep(delay * 1000);
continue;
}
return response;
}
throw new Error(`Max retries (${maxRetries}) exceeded`);
}
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}SSE Stream Handling
When using streaming responses (stream: true), keep the following in mind:
- SSE timeout: The gateway enforces a 300-second idle timeout on SSE streams. If the upstream provider takes longer than 300s between chunks, the stream will close.
- Mid-stream disconnects: If a stream disconnects unexpectedly, do not retry from where it left off. Start a new request instead — LLM providers do not support resumable streams.
- Track requests by ID: Every response includes the
X-CM-Request-Idheader. Save this value before consuming the stream so you can reference it in support requests or status queries. - Distributed tracing: Use
X-CM-Trace-Id(W3C format) to correlate gateway logs with your application traces when debugging stream issues.
# Streaming with error handling
with httpx.stream(
"POST", url, headers=headers, json={**payload, "stream": True}, timeout=300
) as response:
request_id = response.headers.get("X-CM-Request-Id")
trace_id = response.headers.get("X-CM-Trace-Id")
for line in response.iter_lines():
if line.startswith("data: "):
chunk = line[6:]
if chunk == "[DONE]":
break
# Process chunkGovernance Denial Handling
When the governance chain denies a request, the response includes headers that tell you exactly what happened and what to do about it.
402 — Budget Exceeded
Your org has hit its daily or monthly spend cap. Check the X-CM-Daily-Cost header to
see current spend. To resolve:
- Wait for the budget window to reset (daily resets at midnight UTC)
- Contact your org admin to raise the budget via the dashboard
403 — Model or Security Violation
The request was blocked by the model allowlist or the security scanner. Check the
X-CM-Governance-Denied-Step header to see which governance stage blocked the request
(e.g., model_allowlist or security_scanner).
- Model allowlist: The requested model is not approved for your org. Update the allowlist in the dashboard under Settings > Governance > Model Allowlist.
- Security violation: The security scanner detected a prompt injection or jailbreak pattern. Review your prompt content.
422 — PII Detected or Token Cap
- PII detected: The governance chain found sensitive data (SSNs, credit cards, API keys)
in the request body. Remove the PII or add patterns to
pii_allowlistin your org settings if it is a false positive. - Reasoning token cap: The estimated reasoning token usage exceeds your org’s configured
limit. Reduce
max_tokensor raise the cap.
429 — Rate Limited
Use exponential backoff as described above. If you consistently hit rate limits, check your current tier limits in the dashboard under Settings > Governance > Rate Limits and consider upgrading.
Response Headers Reference
Every gateway response includes diagnostic headers. Log these for debugging and support.
| Header | Description |
|---|---|
X-CM-Request-Id | Unique request ID. Include in support tickets. |
X-CM-Trace-Id | W3C trace ID for distributed tracing. |
X-CM-Governance-Time-Ms | Milliseconds spent in the governance chain. |
X-CM-Governance-Denied-Step | Which governance stage denied the request (only present on denials). |
X-CM-Daily-Cost | Current daily spend for the org (USD). |
RateLimit-Limit | Maximum requests allowed in the current window. |
RateLimit-Remaining | Requests remaining in the current window. |
RateLimit-Reset | Unix timestamp when the rate limit window resets. |
Retry-After | Seconds to wait before retrying (present on 429 responses). |
SDK Built-in Retry
If you use the official Curate-Me SDKs, retry logic is handled automatically:
# Python SDK -- retries are built in
from curate_me import CurateMe
client = CurateMe(api_key="cm_sk_xxx")
# Default: 3 retries with exponential backoff for 429/5xx
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)// TypeScript SDK -- retries are built in
import { CurateMe } from "@curate-me/sdk";
const client = new CurateMe({ apiKey: "cm_sk_xxx" });
// Default: 3 retries with exponential backoff for 429/5xx
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});