Skip to Content
GuidesError Handling & Retry Strategy

Error Handling & Retry Strategy

The Curate-Me AI Gateway sits between your application and upstream LLM providers. Errors can originate from two sources: the governance chain (policy denials, budget limits, rate limits) or the upstream provider (timeouts, outages, model errors). Understanding which errors are retryable and which are permanent policy decisions is critical for building reliable integrations.

Error Categories

HTTP StatusSourceRetryable?Meaning
400GatewayNoMalformed request (missing fields, bad JSON)
402GovernanceNoBudget exceeded (daily, monthly, or per-request cap)
403GovernanceNoModel not in allowlist or security violation detected
413GovernanceNoRequest body exceeds size limit
422GovernanceNoReasoning token cap exceeded or PII detected in content
429GovernanceYes (with backoff)Rate limit hit for org/key
502UpstreamYesProvider returned an error
503UpstreamYesProvider temporarily unavailable
504UpstreamYesProvider request timed out

Retry Strategy

Which errors to retry

  • Retry: 429, 502, 503, 504
  • Never retry: 400, 402, 403, 413, 422 — these are policy decisions or malformed requests. Retrying will produce the same result.

Exponential backoff with jitter

Use the formula:

delay = min(base * 2^attempt + random_jitter, max_delay)

Recommended defaults:

ParameterValue
base1 second
max_delay30 seconds
max_retries3
jitter0 to 1 second (uniform random)

Respect rate limit headers

On 429 responses, the gateway includes:

  • Retry-After — seconds to wait before the next request
  • RateLimit-Reset — Unix timestamp when the rate limit window resets
  • RateLimit-Remaining — requests remaining in the current window (will be 0)

Always prefer the Retry-After value over your own backoff calculation when it is present.

Python Example

Using httpx with retry logic:

import httpx import time import random RETRYABLE = {429, 502, 503, 504} NON_RETRYABLE = {400, 402, 403, 413, 422} def call_gateway(payload: dict, max_retries: int = 3) -> httpx.Response: """Call the Curate-Me gateway with automatic retry for transient errors.""" url = "https://api.curate-me.ai/v1/openai/chat/completions" headers = {"X-CM-API-Key": "cm_sk_xxx"} for attempt in range(max_retries + 1): response = httpx.post(url, headers=headers, json=payload, timeout=60) if response.status_code < 400: return response if response.status_code in NON_RETRYABLE: # Policy denial or bad request -- do not retry raise Exception( f"Non-retryable error {response.status_code}: {response.text}" ) if response.status_code == 429: retry_after = int(response.headers.get("Retry-After", 2 ** attempt)) time.sleep(retry_after + random.uniform(0, 1)) continue if response.status_code >= 500: delay = min(2 ** attempt + random.uniform(0, 1), 30) time.sleep(delay) continue return response raise Exception(f"Max retries ({max_retries}) exceeded")

TypeScript Example

Using fetch with async retry:

const RETRYABLE = new Set([429, 502, 503, 504]); const NON_RETRYABLE = new Set([400, 402, 403, 413, 422]); async function callGateway( payload: Record<string, unknown>, maxRetries = 3, ): Promise<Response> { const url = "https://api.curate-me.ai/v1/openai/chat/completions"; const headers = { "Content-Type": "application/json", "X-CM-API-Key": "cm_sk_xxx", }; for (let attempt = 0; attempt <= maxRetries; attempt++) { const response = await fetch(url, { method: "POST", headers, body: JSON.stringify(payload), }); if (response.ok) return response; if (NON_RETRYABLE.has(response.status)) { const body = await response.text(); throw new Error(`Non-retryable error ${response.status}: ${body}`); } if (response.status === 429) { const retryAfter = parseInt( response.headers.get("Retry-After") ?? String(2 ** attempt), 10, ); await sleep(retryAfter * 1000 + Math.random() * 1000); continue; } if (response.status >= 500) { const delay = Math.min(2 ** attempt + Math.random(), 30); await sleep(delay * 1000); continue; } return response; } throw new Error(`Max retries (${maxRetries}) exceeded`); } function sleep(ms: number): Promise<void> { return new Promise((resolve) => setTimeout(resolve, ms)); }

SSE Stream Handling

When using streaming responses (stream: true), keep the following in mind:

  • SSE timeout: The gateway enforces a 300-second idle timeout on SSE streams. If the upstream provider takes longer than 300s between chunks, the stream will close.
  • Mid-stream disconnects: If a stream disconnects unexpectedly, do not retry from where it left off. Start a new request instead — LLM providers do not support resumable streams.
  • Track requests by ID: Every response includes the X-CM-Request-Id header. Save this value before consuming the stream so you can reference it in support requests or status queries.
  • Distributed tracing: Use X-CM-Trace-Id (W3C format) to correlate gateway logs with your application traces when debugging stream issues.
# Streaming with error handling with httpx.stream( "POST", url, headers=headers, json={**payload, "stream": True}, timeout=300 ) as response: request_id = response.headers.get("X-CM-Request-Id") trace_id = response.headers.get("X-CM-Trace-Id") for line in response.iter_lines(): if line.startswith("data: "): chunk = line[6:] if chunk == "[DONE]": break # Process chunk

Governance Denial Handling

When the governance chain denies a request, the response includes headers that tell you exactly what happened and what to do about it.

402 — Budget Exceeded

Your org has hit its daily or monthly spend cap. Check the X-CM-Daily-Cost header to see current spend. To resolve:

  1. Wait for the budget window to reset (daily resets at midnight UTC)
  2. Contact your org admin to raise the budget via the dashboard

403 — Model or Security Violation

The request was blocked by the model allowlist or the security scanner. Check the X-CM-Governance-Denied-Step header to see which governance stage blocked the request (e.g., model_allowlist or security_scanner).

  • Model allowlist: The requested model is not approved for your org. Update the allowlist in the dashboard under Settings > Governance > Model Allowlist.
  • Security violation: The security scanner detected a prompt injection or jailbreak pattern. Review your prompt content.

422 — PII Detected or Token Cap

  • PII detected: The governance chain found sensitive data (SSNs, credit cards, API keys) in the request body. Remove the PII or add patterns to pii_allowlist in your org settings if it is a false positive.
  • Reasoning token cap: The estimated reasoning token usage exceeds your org’s configured limit. Reduce max_tokens or raise the cap.

429 — Rate Limited

Use exponential backoff as described above. If you consistently hit rate limits, check your current tier limits in the dashboard under Settings > Governance > Rate Limits and consider upgrading.

Response Headers Reference

Every gateway response includes diagnostic headers. Log these for debugging and support.

HeaderDescription
X-CM-Request-IdUnique request ID. Include in support tickets.
X-CM-Trace-IdW3C trace ID for distributed tracing.
X-CM-Governance-Time-MsMilliseconds spent in the governance chain.
X-CM-Governance-Denied-StepWhich governance stage denied the request (only present on denials).
X-CM-Daily-CostCurrent daily spend for the org (USD).
RateLimit-LimitMaximum requests allowed in the current window.
RateLimit-RemainingRequests remaining in the current window.
RateLimit-ResetUnix timestamp when the rate limit window resets.
Retry-AfterSeconds to wait before retrying (present on 429 responses).

SDK Built-in Retry

If you use the official Curate-Me SDKs, retry logic is handled automatically:

# Python SDK -- retries are built in from curate_me import CurateMe client = CurateMe(api_key="cm_sk_xxx") # Default: 3 retries with exponential backoff for 429/5xx response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}], )
// TypeScript SDK -- retries are built in import { CurateMe } from "@curate-me/sdk"; const client = new CurateMe({ apiKey: "cm_sk_xxx" }); // Default: 3 retries with exponential backoff for 429/5xx const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello" }], });