Skip to Content
GatewayGovernance Chain

Governance Chain

The governance chain is the policy pipeline that evaluates every request before it reaches the upstream LLM provider. Checks run in order and short-circuit on the first denial.

Pipeline overview

Request --> [0. Plan Enforcement] --> [1. Rate Limit] --> [2. Cost + Budget] --> [3. Runner Session Budget] --> [4. PII + Content Safety] --> [5. Model Access] --> [6. HITL Gate] --> Proxy

Each step returns one of three actions:

ActionMeaning
ALLOWRequest passes this check, proceed to the next step
BLOCKRequest is denied immediately with an error response
NEEDS_APPROVALRequest is held for human review (HITL gate only)

What can happen

OutcomeTypical status
Rate-limit denial429
Plan-tier denial429
Cost, budget, PII, content safety, runner session, or model denial403
Human approval required202

Step 0: Plan enforcement

Plan enforcement runs before the rest of governance. It checks whether the org’s subscription is active, whether the daily request quota for the plan has been reached, and whether the requested model is available on that plan.

This is separate from org-level gateway policy. Plan enforcement answers whether an org can use a class of service at all. Governance policy answers what custom controls apply to that org’s traffic.

Step 1: Rate limiting

Rate limiting enforces a maximum number of requests per minute per organization.

  • Each request increments a per-org, per-minute counter in Redis
  • If the counter exceeds the org’s RPM limit, the request is blocked with 429
  • The counter key has a 120-second TTL to cover minute boundaries
  • Rate-limit metadata is included in response headers whether the request is allowed or blocked
HTTP/1.1 429 Too Many Requests X-RateLimit-Limit: 60 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1708642860

Step 2: Cost and budget checks

Before proxying, the gateway estimates request cost and checks it against:

  • Per-request cost ceilings
  • Daily budget
  • Monthly budget

It also returns spend metadata in X-CM-Cost, X-CM-Daily-Cost, and X-CM-Daily-Budget.

Example blocked response:

{ "error": { "code": "daily_budget", "message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit" } }

Step 3: Runner session budget

When a request is tied to a managed runner session, the gateway can apply an additional per-session cost check. This prevents long-running sessions from silently burning through budget after the request already passed general org policy.

Step 4: PII and content safety

The gateway scans request text before it leaves your infrastructure.

PII scanning focuses on secrets, credentials, and regulated data. Content safety focuses on prompt injection, jailbreak patterns, and exfiltration signals when the relevant guardrails are enabled.

Typical findings include:

CategoryTypes
SecretsAPI keys, bearer tokens, passwords, cloud credentials
Personal dataEmail addresses, SSNs, passport-like identifiers
Financial dataCredit cards, IBAN-style values
Safety signalsPrompt injection, jailbreak, exfiltration attempts

PII actions are policy-driven. An org can block findings outright or convert them into an approval requirement.

Step 5: Model access

Model access combines two controls:

  • Plan-level model entitlement
  • Org-level allowlists in gateway policy

Model checks happen after alias resolution.

Example allowlist:

{ "allowed_models": [ "gpt-4o-mini", "claude-haiku-3-5-20241022", "gemini-2.5-flash" ] }

Step 6: HITL gate

The human-in-the-loop gate turns an otherwise valid request into a pending approval when the estimated cost exceeds the configured threshold.

HTTP/1.1 202 Accepted X-CM-Approval-ID: apr_abc123def456 Retry-After: 30
{ "status": "pending_approval", "approval_id": "apr_abc123def456", "message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00", "retry_after_seconds": 30, "estimated_cost": 12.50 }

Pending approvals appear in the dashboard approval queue and can be approved or rejected there before the caller retries the request.

Default governance tiers

The gateway falls back to built-in defaults when an org has no custom policy:

TierRPMDaily BudgetMonthly BudgetMax Cost/RequestHITL Threshold
Free10$5$50$0.25$1
Starter60$25$250$0.50$3
Growth300$100$2,000$2.00$10
Enterprise5,000$2,000$50,000$10.00$50

Backward-compatibility aliases still exist for older tier names such as pro, professional, and team.

Short-circuit behavior

The governance chain is designed to fail fast. The first denial terminates evaluation.

  • If rate limiting blocks a request, cost estimation is never performed
  • If cost governance blocks a request, the provider never receives the call
  • Headers from completed checks are still carried into blocked and approval responses so clients get rate-limit and spend context

Backend implementation

The governance engine is implemented in services/backend/src/gateway/governance.py as the GovernanceEngine class. It exposes a single evaluate() method that accepts a GatewayRequest and GovernancePolicy and returns a PolicyDecision.

Key source files:

FilePurpose
src/gateway/governance.pyGovernance engine with all check implementations
src/gateway/models.pyGatewayRequest, GovernancePolicy, PolicyDecision models
src/gateway/model_pricing.pyToken cost estimation and model pricing tables
src/gateway/content_safety.pyContent safety scanner (prompt injection, jailbreak detection)
src/services/pii_detection_service.pyPresidio NER integration for enhanced PII detection