Runbook: Gateway High Latency

Owner: Platform Team Backup owner: On-call engineer Last validated: Not yet validated Validation method: Manual drill Severity trigger: SEV2 Customer impact: All gateway requests experience elevated latency; agents slow or timeout Required access: SSH (VPS), Redis, Docker Related services: curateme-backend-gateway

This runbook covers diagnosing and resolving high latency through the Curate-Me AI Gateway. Follow the steps in order — most issues resolve in the first two sections.

Symptoms

P95 gateway latency exceeds 500ms (excluding upstream provider time)
Clients receiving timeout errors or SSE streams stalling before first token
Dashboard latency charts show a sudden spike or sustained elevation
X-Gateway-Retry-Delay-Ms response header shows high retry backoff times

Step 1: Check governance step timings

The gateway exposes per-step latency for the governance chain. This tells you exactly which governance check is slow.


curl https://api.curate-me.ai/gateway/admin/latency \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "X-Org-ID: $ORG_ID"

Expected response:


{
  "governance_steps": {
    "plan_enforcement_ms": 2,
    "rate_limit_ms": 3,
    "cost_estimation_ms": 8,
    "pii_scan_ms": 45,
    "model_allowlist_ms": 1,
    "hitl_gate_ms": 2
  },
  "proxy_connect_ms": 15,
  "proxy_first_byte_ms": 320,
  "total_ms": 396
}

What to look for:

Field	Healthy	Investigate if
`rate_limit_ms`	< 5ms	> 50ms (Redis connectivity issue)
`cost_estimation_ms`	< 15ms	> 100ms (tiktoken encoding on large payload)
`pii_scan_ms`	< 50ms	> 200ms (large payload or Presidio NER enabled)
`proxy_connect_ms`	< 30ms	> 200ms (DNS or network issue to provider)
`proxy_first_byte_ms`	Varies by model	2x or more above baseline for the same model

Step 2: Check the connection pool

The gateway maintains an httpx connection pool to upstream providers. Pool exhaustion causes requests to queue.


curl https://api.curate-me.ai/gateway/admin/pool \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Expected response:


{
  "pool_status": {
    "active_connections": 12,
    "idle_connections": 38,
    "max_connections": 100,
    "pending_requests": 0
  }
}

What to look for:

Condition	Meaning	Action
`active_connections` near `max_connections`	Pool is saturated	Scale gateway or increase pool size
`pending_requests` > 0	Requests queuing for a connection	Immediate concern — check provider health
`idle_connections` = 0 and `active` high	All connections in use	Burst traffic or provider slowdown

Step 3: Identify the root cause

Cause A: PII scanning on large payloads

PII scanning uses regex pattern matching on the full request body. For payloads above 100KB (common with base64 images or large tool definitions), scan time can exceed 200ms.

Diagnosis: pii_scan_ms is the dominant cost in the governance step timings.

Fix (immediate): Disable PII scanning for the affected org while you investigate:


curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"pii_scan_enabled": false}'

Fix (long-term): If the org sends large payloads regularly, set pii_action to ALLOW (log-only mode) instead of fully disabling:


curl -X PATCH https://api.curate-me.ai/api/v1/admin/gateway/policy/$ORG_ID \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"pii_action": "ALLOW"}'

Cause B: Upstream provider slowdown

If governance step timings are healthy but proxy_first_byte_ms is elevated, the upstream provider is the bottleneck.

Diagnosis: Check provider health and circuit breaker status:


curl https://api.curate-me.ai/v1/providers/health \
  -H "X-CM-API-Key: $API_KEY"

Look for providers in half_open or open state. Cross-reference with the provider’s status page:

Provider	Status Page
OpenAI	status.openai.com
Anthropic	status.anthropic.com
Google AI	status.cloud.google.com
DeepSeek	status.deepseek.com

Fix: If the provider is degraded, enable fallback routing to an alternative provider:


curl -X POST https://api.curate-me.ai/gateway/admin/providers/fallback \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "primary_provider": "openai",
    "fallback_provider": "anthropic",
    "trigger": "circuit_breaker_open"
  }'

Cause C: Redis unavailable or slow

Redis is used for rate limiting, cost accumulation, and caching. If Redis is slow or unreachable, every governance check that touches Redis will add latency.

Diagnosis: Both rate_limit_ms and cost_estimation_ms are elevated. Check Redis connectivity:


# On the gateway host
redis-cli -u $REDIS_URL ping
# Expected: PONG
 
redis-cli -u $REDIS_URL info stats | grep instantaneous_ops_per_sec
# Check if ops/sec is abnormally high

Fix:

Verify Redis is running and reachable from the gateway container
Check Redis memory usage — if near maxmemory, keys may be evicted causing cache misses
Restart the Redis container if it is unresponsive:
```
docker restart curateme-redis
```

Cause D: Connection pool exhaustion

When all connections are in use, new requests queue until a connection becomes available.

Diagnosis: pending_requests > 0 in the pool status response.

Fix (immediate): Increase the connection pool size:


# In the gateway environment configuration
GATEWAY_MAX_CONNECTIONS=200
GATEWAY_MAX_KEEPALIVE_CONNECTIONS=50

Fix (long-term): If pool exhaustion is recurring, the gateway needs horizontal scaling (additional instances behind a load balancer).

Step 4: Verify resolution

After applying a fix, confirm that latency has returned to normal:


# Check governance step timings again
curl https://api.curate-me.ai/gateway/admin/latency \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "X-Org-ID: $ORG_ID"
 
# Run a test request and check the total round-trip time
time curl -s -o /dev/null -w "%{time_total}" \
  https://api.curate-me.ai/v1/openai/chat/completions \
  -H "X-CM-API-Key: $API_KEY" \
  -H "Authorization: Bearer $OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "ping"}], "max_tokens": 5}'

Rollback

Revert the changes described in the Procedure section. If a configuration change was made, restore the previous value from the MongoDB audit log or Redis backup.

Verification

After applying the fix, verify:

The symptoms listed above are no longer present
No new errors in gateway logs: docker logs curateme-backend-gateway --tail=50
Health check passes: curl -s http://localhost:8002/health | jq .status

Escalation

If none of the above steps resolve the issue:

Collect the X-CM-Request-ID from affected requests
Pull recent gateway error logs: ./scripts/errors by-source gateway
Check system-level metrics: ./scripts/analytics performance
Contact the platform team with the request IDs and the output of the latency and pool endpoints

Redis Incident — Redis unavailability or slowness is a common cause of gateway latency
Cost Accumulation Lag — high gateway latency can delay cost recording
Provider Failover Loop — upstream provider slowdowns contribute to elevated latency