Skip to Content
RunbooksCost Accumulation Falls Behind

Cost Accumulation Falls Behind

Symptoms

  • Dashboard shows lower daily spend than expected
  • Budget warnings trigger late or not at all
  • Metered billing reports don’t match gateway logs

Likely Causes

  1. Redis connection issues — cost accumulation uses INCRBYFLOAT which requires Redis
  2. Non-atomic fallback active — if Redis INCRBYFLOAT is unavailable, the fallback read-then-write path can lose increments under concurrency
  3. MongoDB write failures — usage records failed to persist, triggering dead-letter queue
  4. High request volume — cost recording is async but can back up under extreme load

Triage Steps

1. Check Redis connectivity

./scripts/analytics health # Look for: redis_connected: true, redis_latency_ms < 10

2. Check for non-atomic fallback warnings

# Search backend logs for the fallback warning ./scripts/errors by-source gateway | grep "non_atomic_fallback"

3. Check dead-letter queue

# Check if failed usage records are accumulating redis-cli LLEN gateway:dlq:usage_records

4. Compare Redis vs MongoDB totals

./scripts/analytics costs today # Compare redis_daily_total vs mongodb_daily_total

Resolution

Redis connection restored

The counters will self-heal as new requests increment correctly. For the gap period, recalculate from MongoDB:

./scripts/analytics costs reconcile

Dead-letter queue processing

Failed records can be replayed:

./scripts/analytics costs replay-dlq

Escalation

If cost drift exceeds 10% of daily spend, escalate to on-call.