Runbook: Gateway High Latency
This runbook covers diagnosing and resolving high latency through the Curate-Me AI Gateway. Follow the steps in order — most issues resolve in the first two sections.
Symptoms
- P95 gateway latency exceeds 500ms (excluding upstream provider time)
- Clients receiving timeout errors or SSE streams stalling before first token
- Dashboard latency charts show a sudden spike or sustained elevation
X-Gateway-Retry-Delay-Msresponse header shows high retry backoff times
Step 1: Check governance step timings
The gateway exposes per-step latency for the governance chain. This tells you exactly which governance check is slow.
curl https://api.curate-me.ai/gateway/admin/latency \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "X-Org-ID: $ORG_ID"Expected response:
{
"governance_steps": {
"plan_enforcement_ms": 2,
"rate_limit_ms": 3,
"cost_estimation_ms": 8,
"pii_scan_ms": 45,
"model_allowlist_ms": 1,
"hitl_gate_ms": 2
},
"proxy_connect_ms": 15,
"proxy_first_byte_ms": 320,
"total_ms": 396
}What to look for:
| Field | Healthy | Investigate if |
|---|---|---|
rate_limit_ms | < 5ms | > 50ms (Redis connectivity issue) |
cost_estimation_ms | < 15ms | > 100ms (tiktoken encoding on large payload) |
pii_scan_ms | < 50ms | > 200ms (large payload or Presidio NER enabled) |
proxy_connect_ms | < 30ms | > 200ms (DNS or network issue to provider) |
proxy_first_byte_ms | Varies by model | 2x or more above baseline for the same model |
Step 2: Check the connection pool
The gateway maintains an httpx connection pool to upstream providers. Pool exhaustion causes requests to queue.
curl https://api.curate-me.ai/gateway/admin/pool \
-H "Authorization: Bearer $ADMIN_TOKEN"Expected response:
{
"pool_status": {
"active_connections": 12,
"idle_connections": 38,
"max_connections": 100,
"pending_requests": 0
}
}What to look for:
| Condition | Meaning | Action |
|---|---|---|
active_connections near max_connections | Pool is saturated | Scale gateway or increase pool size |
pending_requests > 0 | Requests queuing for a connection | Immediate concern — check provider health |
idle_connections = 0 and active high | All connections in use | Burst traffic or provider slowdown |
Step 3: Identify the root cause
Cause A: PII scanning on large payloads
PII scanning uses regex pattern matching on the full request body. For payloads above 100KB (common with base64 images or large tool definitions), scan time can exceed 200ms.
Diagnosis: pii_scan_ms is the dominant cost in the governance step timings.
Fix (immediate): Disable PII scanning for the affected org while you investigate:
curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"pii_scan_enabled": false}'Fix (long-term): If the org sends large payloads regularly, set pii_action to ALLOW (log-only mode) instead of fully disabling:
curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"pii_action": "ALLOW"}'Cause B: Upstream provider slowdown
If governance step timings are healthy but proxy_first_byte_ms is elevated, the upstream provider is the bottleneck.
Diagnosis: Check provider health and circuit breaker status:
curl https://api.curate-me.ai/v1/providers/health \
-H "X-CM-API-Key: $API_KEY"Look for providers in half_open or open state. Cross-reference with the provider’s status page:
| Provider | Status Page |
|---|---|
| OpenAI | status.openai.com |
| Anthropic | status.anthropic.com |
| Google AI | status.cloud.google.com |
| DeepSeek | status.deepseek.com |
Fix: If the provider is degraded, enable fallback routing to an alternative provider:
curl -X POST https://api.curate-me.ai/gateway/admin/providers/fallback \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"primary_provider": "openai",
"fallback_provider": "anthropic",
"trigger": "circuit_breaker_open"
}'Cause C: Redis unavailable or slow
Redis is used for rate limiting, cost accumulation, and caching. If Redis is slow or unreachable, every governance check that touches Redis will add latency.
Diagnosis: Both rate_limit_ms and cost_estimation_ms are elevated. Check Redis connectivity:
# On the gateway host
redis-cli -u $REDIS_URL ping
# Expected: PONG
redis-cli -u $REDIS_URL info stats | grep instantaneous_ops_per_sec
# Check if ops/sec is abnormally highFix:
- Verify Redis is running and reachable from the gateway container
- Check Redis memory usage — if near
maxmemory, keys may be evicted causing cache misses - Restart the Redis container if it is unresponsive:
docker restart curateme-redis
Cause D: Connection pool exhaustion
When all connections are in use, new requests queue until a connection becomes available.
Diagnosis: pending_requests > 0 in the pool status response.
Fix (immediate): Increase the connection pool size:
# In the gateway environment configuration
GATEWAY_MAX_CONNECTIONS=200
GATEWAY_MAX_KEEPALIVE_CONNECTIONS=50Fix (long-term): If pool exhaustion is recurring, the gateway needs horizontal scaling (additional instances behind a load balancer).
Step 4: Verify resolution
After applying a fix, confirm that latency has returned to normal:
# Check governance step timings again
curl https://api.curate-me.ai/gateway/admin/latency \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "X-Org-ID: $ORG_ID"
# Run a test request and check the total round-trip time
time curl -s -o /dev/null -w "%{time_total}" \
https://api.curate-me.ai/v1/openai/chat/completions \
-H "X-CM-API-Key: $API_KEY" \
-H "Authorization: Bearer $OPENAI_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "ping"}], "max_tokens": 5}'Escalation
If none of the above steps resolve the issue:
- Collect the
X-CM-Request-IDfrom affected requests - Pull recent gateway error logs:
./scripts/errors by-source gateway - Check system-level metrics:
./scripts/analytics performance - Contact the platform team with the request IDs and the output of the latency and pool endpoints