Cost Accumulation Falls Behind

Symptoms

Dashboard shows lower daily spend than expected
Budget warnings trigger late or not at all
Metered billing reports don’t match gateway logs

Likely Causes

Redis connection issues — cost accumulation uses INCRBYFLOAT which requires Redis
Non-atomic fallback active — if Redis INCRBYFLOAT is unavailable, the fallback read-then-write path can lose increments under concurrency
MongoDB write failures — usage records failed to persist, triggering dead-letter queue
High request volume — cost recording is async but can back up under extreme load

Triage Steps

1. Check Redis connectivity


./scripts/analytics health
# Look for: redis_connected: true, redis_latency_ms < 10

2. Check for non-atomic fallback warnings


# Search backend logs for the fallback warning
./scripts/errors by-source gateway | grep "non_atomic_fallback"

3. Check dead-letter queue


# Check if failed usage records are accumulating
redis-cli LLEN gateway:dlq:usage_records

4. Compare Redis vs MongoDB totals


./scripts/analytics costs today
# Compare redis_daily_total vs mongodb_daily_total

Resolution

Redis connection restored

The counters will self-heal as new requests increment correctly. For the gap period, recalculate from MongoDB:


./scripts/analytics costs reconcile

Dead-letter queue processing

Failed records can be replayed:


./scripts/analytics costs replay-dlq

Escalation

If cost drift exceeds 10% of daily spend, escalate to on-call.

Runbook: Budget Exceeded / Cost Spike Runbook: Gateway High Latency