Error Handling & Retry Strategy

The Curate-Me AI Gateway sits between your application and upstream LLM providers. Errors can originate from two sources: the governance chain (policy denials, budget limits, rate limits) or the upstream provider (timeouts, outages, model errors). Understanding which errors are retryable and which are permanent policy decisions is critical for building reliable integrations.

Error Categories

HTTP Status	Source	Retryable?	Meaning
400	Gateway	No	Malformed request (missing fields, bad JSON)
402	Governance	No	Budget exceeded (daily, monthly, or per-request cap)
403	Governance	No	Model not in allowlist or security violation detected
413	Governance	No	Request body exceeds size limit
422	Governance	No	Reasoning token cap exceeded or PII detected in content
429	Governance	Yes (with backoff)	Rate limit hit for org/key
502	Upstream	Yes	Provider returned an error
503	Upstream	Yes	Provider temporarily unavailable
504	Upstream	Yes	Provider request timed out

Retry Strategy

Which errors to retry

Retry: 429, 502, 503, 504
Never retry: 400, 402, 403, 413, 422 — these are policy decisions or malformed requests. Retrying will produce the same result.

Exponential backoff with jitter

Use the formula:


delay = min(base * 2^attempt + random_jitter, max_delay)

Recommended defaults:

Parameter	Value
`base`	1 second
`max_delay`	30 seconds
`max_retries`	3
`jitter`	0 to 1 second (uniform random)

Respect rate limit headers

On 429 responses, the gateway includes:

Retry-After — seconds to wait before the next request
RateLimit-Reset — Unix timestamp when the rate limit window resets
RateLimit-Remaining — requests remaining in the current window (will be 0)

Always prefer the Retry-After value over your own backoff calculation when it is present.

Python Example

Using httpx with retry logic:


import httpx
import time
import random
 
RETRYABLE = {429, 502, 503, 504}
NON_RETRYABLE = {400, 402, 403, 413, 422}
 
def call_gateway(payload: dict, max_retries: int = 3) -> httpx.Response:
    """Call the Curate-Me gateway with automatic retry for transient errors."""
    url = "https://api.curate-me.ai/v1/openai/chat/completions"
    headers = {"X-CM-API-Key": "cm_sk_xxx"}
 
    for attempt in range(max_retries + 1):
        response = httpx.post(url, headers=headers, json=payload, timeout=60)
 
        if response.status_code < 400:
            return response
 
        if response.status_code in NON_RETRYABLE:
            # Policy denial or bad request -- do not retry
            raise Exception(
                f"Non-retryable error {response.status_code}: {response.text}"
            )
 
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
            time.sleep(retry_after + random.uniform(0, 1))
            continue
 
        if response.status_code >= 500:
            delay = min(2 ** attempt + random.uniform(0, 1), 30)
            time.sleep(delay)
            continue
 
        return response
 
    raise Exception(f"Max retries ({max_retries}) exceeded")

TypeScript Example

Using fetch with async retry:


const RETRYABLE = new Set([429, 502, 503, 504]);
const NON_RETRYABLE = new Set([400, 402, 403, 413, 422]);
 
async function callGateway(
  payload: Record<string, unknown>,
  maxRetries = 3,
): Promise<Response> {
  const url = "https://api.curate-me.ai/v1/openai/chat/completions";
  const headers = {
    "Content-Type": "application/json",
    "X-CM-API-Key": "cm_sk_xxx",
  };
 
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, {
      method: "POST",
      headers,
      body: JSON.stringify(payload),
    });
 
    if (response.ok) return response;
 
    if (NON_RETRYABLE.has(response.status)) {
      const body = await response.text();
      throw new Error(`Non-retryable error ${response.status}: ${body}`);
    }
 
    if (response.status === 429) {
      const retryAfter = parseInt(
        response.headers.get("Retry-After") ?? String(2 ** attempt),
        10,
      );
      await sleep(retryAfter * 1000 + Math.random() * 1000);
      continue;
    }
 
    if (response.status >= 500) {
      const delay = Math.min(2 ** attempt + Math.random(), 30);
      await sleep(delay * 1000);
      continue;
    }
 
    return response;
  }
 
  throw new Error(`Max retries (${maxRetries}) exceeded`);
}
 
function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

SSE Stream Handling

When using streaming responses (stream: true), keep the following in mind:

SSE timeout: The gateway enforces a 300-second idle timeout on SSE streams. If the upstream provider takes longer than 300s between chunks, the stream will close.
Mid-stream disconnects: If a stream disconnects unexpectedly, do not retry from where it left off. Start a new request instead — LLM providers do not support resumable streams.
Track requests by ID: Every response includes the X-CM-Request-Id header. Save this value before consuming the stream so you can reference it in support requests or status queries.
Distributed tracing: Use X-CM-Trace-Id (W3C format) to correlate gateway logs with your application traces when debugging stream issues.


# Streaming with error handling
with httpx.stream(
    "POST", url, headers=headers, json={**payload, "stream": True}, timeout=300
) as response:
    request_id = response.headers.get("X-CM-Request-Id")
    trace_id = response.headers.get("X-CM-Trace-Id")
 
    for line in response.iter_lines():
        if line.startswith("data: "):
            chunk = line[6:]
            if chunk == "[DONE]":
                break
            # Process chunk

Governance Denial Handling

When the governance chain denies a request, the response includes headers that tell you exactly what happened and what to do about it.

402 — Budget Exceeded

Your org has hit its daily or monthly spend cap. Check the X-CM-Daily-Cost header to see current spend. To resolve:

Wait for the budget window to reset (daily resets at midnight UTC)
Contact your org admin to raise the budget via the dashboard

403 — Model or Security Violation

The request was blocked by the model allowlist or the security scanner. Check the X-CM-Governance-Denied-Step header to see which governance stage blocked the request (e.g., model_allowlist or security_scanner).

Model allowlist: The requested model is not approved for your org. Update the allowlist in the dashboard under Settings > Governance > Model Allowlist.
Security violation: The security scanner detected a prompt injection or jailbreak pattern. Review your prompt content.

422 — PII Detected or Token Cap

PII detected: The governance chain found sensitive data (SSNs, credit cards, API keys) in the request body. Remove the PII or add patterns to pii_allowlist in your org settings if it is a false positive.
Reasoning token cap: The estimated reasoning token usage exceeds your org’s configured limit. Reduce max_tokens or raise the cap.

429 — Rate Limited

Use exponential backoff as described above. If you consistently hit rate limits, check your current tier limits in the dashboard under Settings > Governance > Rate Limits and consider upgrading.

Response Headers Reference

Every gateway response includes diagnostic headers. Log these for debugging and support.

Header	Description
`X-CM-Request-Id`	Unique request ID. Include in support tickets.
`X-CM-Trace-Id`	W3C trace ID for distributed tracing.
`X-CM-Governance-Time-Ms`	Milliseconds spent in the governance chain.
`X-CM-Governance-Denied-Step`	Which governance stage denied the request (only present on denials).
`X-CM-Daily-Cost`	Current daily spend for the org (USD).
`RateLimit-Limit`	Maximum requests allowed in the current window.
`RateLimit-Remaining`	Requests remaining in the current window.
`RateLimit-Reset`	Unix timestamp when the rate limit window resets.
`Retry-After`	Seconds to wait before retrying (present on 429 responses).

SDK Built-in Retry

If you use the official Curate-Me SDKs, retry logic is handled automatically:


# Python SDK -- retries are built in
from curate_me import CurateMe
 
client = CurateMe(api_key="cm_sk_xxx")
# Default: 3 retries with exponential backoff for 429/5xx
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)


// TypeScript SDK -- retries are built in
import { CurateMe } from "@curate-me/sdk";
 
const client = new CurateMe({ apiKey: "cm_sk_xxx" });
// Default: 3 retries with exponential backoff for 429/5xx
const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});