Skip to Content
RunbooksRunbook: Gateway High Latency

Runbook: Gateway High Latency

This runbook covers diagnosing and resolving high latency through the Curate-Me AI Gateway. Follow the steps in order — most issues resolve in the first two sections.


Symptoms

  • P95 gateway latency exceeds 500ms (excluding upstream provider time)
  • Clients receiving timeout errors or SSE streams stalling before first token
  • Dashboard latency charts show a sudden spike or sustained elevation
  • X-Gateway-Retry-Delay-Ms response header shows high retry backoff times

Step 1: Check governance step timings

The gateway exposes per-step latency for the governance chain. This tells you exactly which governance check is slow.

curl https://api.curate-me.ai/gateway/admin/latency \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID"

Expected response:

{ "governance_steps": { "plan_enforcement_ms": 2, "rate_limit_ms": 3, "cost_estimation_ms": 8, "pii_scan_ms": 45, "model_allowlist_ms": 1, "hitl_gate_ms": 2 }, "proxy_connect_ms": 15, "proxy_first_byte_ms": 320, "total_ms": 396 }

What to look for:

FieldHealthyInvestigate if
rate_limit_ms< 5ms> 50ms (Redis connectivity issue)
cost_estimation_ms< 15ms> 100ms (tiktoken encoding on large payload)
pii_scan_ms< 50ms> 200ms (large payload or Presidio NER enabled)
proxy_connect_ms< 30ms> 200ms (DNS or network issue to provider)
proxy_first_byte_msVaries by model2x or more above baseline for the same model

Step 2: Check the connection pool

The gateway maintains an httpx connection pool to upstream providers. Pool exhaustion causes requests to queue.

curl https://api.curate-me.ai/gateway/admin/pool \ -H "Authorization: Bearer $ADMIN_TOKEN"

Expected response:

{ "pool_status": { "active_connections": 12, "idle_connections": 38, "max_connections": 100, "pending_requests": 0 } }

What to look for:

ConditionMeaningAction
active_connections near max_connectionsPool is saturatedScale gateway or increase pool size
pending_requests > 0Requests queuing for a connectionImmediate concern — check provider health
idle_connections = 0 and active highAll connections in useBurst traffic or provider slowdown

Step 3: Identify the root cause

Cause A: PII scanning on large payloads

PII scanning uses regex pattern matching on the full request body. For payloads above 100KB (common with base64 images or large tool definitions), scan time can exceed 200ms.

Diagnosis: pii_scan_ms is the dominant cost in the governance step timings.

Fix (immediate): Disable PII scanning for the affected org while you investigate:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"pii_scan_enabled": false}'

Fix (long-term): If the org sends large payloads regularly, set pii_action to ALLOW (log-only mode) instead of fully disabling:

curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"pii_action": "ALLOW"}'

Cause B: Upstream provider slowdown

If governance step timings are healthy but proxy_first_byte_ms is elevated, the upstream provider is the bottleneck.

Diagnosis: Check provider health and circuit breaker status:

curl https://api.curate-me.ai/v1/providers/health \ -H "X-CM-API-Key: $API_KEY"

Look for providers in half_open or open state. Cross-reference with the provider’s status page:

ProviderStatus Page
OpenAIstatus.openai.com 
Anthropicstatus.anthropic.com 
Google AIstatus.cloud.google.com 
DeepSeekstatus.deepseek.com 

Fix: If the provider is degraded, enable fallback routing to an alternative provider:

curl -X POST https://api.curate-me.ai/gateway/admin/providers/fallback \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "primary_provider": "openai", "fallback_provider": "anthropic", "trigger": "circuit_breaker_open" }'

Cause C: Redis unavailable or slow

Redis is used for rate limiting, cost accumulation, and caching. If Redis is slow or unreachable, every governance check that touches Redis will add latency.

Diagnosis: Both rate_limit_ms and cost_estimation_ms are elevated. Check Redis connectivity:

# On the gateway host redis-cli -u $REDIS_URL ping # Expected: PONG redis-cli -u $REDIS_URL info stats | grep instantaneous_ops_per_sec # Check if ops/sec is abnormally high

Fix:

  1. Verify Redis is running and reachable from the gateway container
  2. Check Redis memory usage — if near maxmemory, keys may be evicted causing cache misses
  3. Restart the Redis container if it is unresponsive:
    docker restart curateme-redis

Cause D: Connection pool exhaustion

When all connections are in use, new requests queue until a connection becomes available.

Diagnosis: pending_requests > 0 in the pool status response.

Fix (immediate): Increase the connection pool size:

# In the gateway environment configuration GATEWAY_MAX_CONNECTIONS=200 GATEWAY_MAX_KEEPALIVE_CONNECTIONS=50

Fix (long-term): If pool exhaustion is recurring, the gateway needs horizontal scaling (additional instances behind a load balancer).


Step 4: Verify resolution

After applying a fix, confirm that latency has returned to normal:

# Check governance step timings again curl https://api.curate-me.ai/gateway/admin/latency \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "X-Org-ID: $ORG_ID" # Run a test request and check the total round-trip time time curl -s -o /dev/null -w "%{time_total}" \ https://api.curate-me.ai/v1/openai/chat/completions \ -H "X-CM-API-Key: $API_KEY" \ -H "Authorization: Bearer $OPENAI_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "ping"}], "max_tokens": 5}'

Escalation

If none of the above steps resolve the issue:

  1. Collect the X-CM-Request-ID from affected requests
  2. Pull recent gateway error logs: ./scripts/errors by-source gateway
  3. Check system-level metrics: ./scripts/analytics performance
  4. Contact the platform team with the request IDs and the output of the latency and pool endpoints