Skip to Content
GatewayGovernance Chain

Governance Chain

The governance chain is the policy pipeline that evaluates every request before it reaches the upstream LLM provider. Checks run in order and short-circuit on the first denial.

Pipeline overview

The chain has 13 steps across 7 stages. Sub-steps (0.5, 1.5, etc.) were added as the governance engine matured.

Request --> [0. Plan Enforcement] subscription status, quota, model access --> [0.5 Body Size Limit] request body size vs tier limit --> [1. Rate Limit] requests per minute (sliding window) --> [1.5 Plan Entitlement] legacy daily request/budget check --> [1.7 Reasoning Token Cap] cap max reasoning/thinking tokens --> [2. Cost Estimate] estimated cost vs per-request + daily budget --> [2.5 Hierarchical Budget] Org -> Team -> Key budget hierarchy --> [3. Runner Session Budget] per-session cost limit for managed runners --> [4. PII Scan] Presidio NER + regex for secrets and PII --> [4.5 Content Safety] prompt injection / jailbreak detection --> [4.6 Security Scanner] advanced injection, exfiltration, encoded payloads --> [5. Model Allowlist] model allowlist enforcement --> [6. HITL Gate] flag high-cost requests for human approval --> Proxy

Each step returns one of three actions:

ActionMeaning
ALLOWRequest passes this check, proceed to the next step
BLOCKRequest is denied immediately with an error response
NEEDS_APPROVALRequest is held for human review (HITL gate only)

What can happen

OutcomeTypical status
Rate-limit denial429
Plan-tier denial429
Cost, budget, PII, content safety, runner session, or model denial403
Human approval required202

Step 0: Plan enforcement

Plan enforcement runs before the rest of governance. It checks whether the org’s subscription is active, whether the daily request quota for the plan has been reached, and whether the requested model is available on that plan.

This is separate from org-level gateway policy. Plan enforcement answers whether an org can use a class of service at all. Governance policy answers what custom controls apply to that org’s traffic.

Step 0.5: Body size limit

Rejects requests whose body exceeds a tier-specific or per-org byte limit before any expensive processing occurs. Free tier allows 1 MB, Starter 10 MB, Growth 50 MB, Enterprise 100 MB.

Step 1: Rate limiting

Rate limiting enforces a maximum number of requests per minute per organization.

  • Each request increments a per-org, per-minute counter in Redis
  • If the counter exceeds the org’s RPM limit, the request is blocked with 429
  • The counter key has a 120-second TTL to cover minute boundaries
  • Rate-limit metadata is included in response headers whether the request is allowed or blocked
HTTP/1.1 429 Too Many Requests X-RateLimit-Limit: 60 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1708642860

Step 1.5: Plan entitlement

A legacy daily request/budget check using billing_prices configuration. This step exists for backward compatibility with older billing integrations and may be removed in a future release.

Step 1.7: Reasoning token cap

Enforces a maximum number of reasoning/thinking tokens per request. This prevents models with extended thinking (e.g., Claude with adaptive thinking, OpenAI o1/o3) from consuming excessive tokens. Limits are tier-based: Free 4,096, Starter 16,384, Growth 65,536, Enterprise unlimited.

Step 2: Cost and budget checks

Before proxying, the gateway estimates request cost and checks it against:

  • Per-request cost ceilings
  • Daily budget
  • Monthly budget

It also returns spend metadata in X-CM-Cost, X-CM-Daily-Cost, and X-CM-Daily-Budget.

Example blocked response:

{ "error": { "code": "daily_budget", "message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit" } }

Step 2.5: Hierarchical budget

Enforces a three-level budget hierarchy: Organization -> Team -> API Key. Each level can have its own daily/monthly budget. A request is blocked if any level in the hierarchy is over budget. This allows teams within an organization to have independent spending limits.

Step 3: Runner session budget

When a request is tied to a managed runner session, the gateway can apply an additional per-session cost check. This prevents long-running sessions from silently burning through budget after the request already passed general org policy.

Step 4: PII scan

The gateway scans request text for secrets, credentials, and regulated data before it leaves your infrastructure.

CategoryTypes
SecretsAPI keys, bearer tokens, passwords, cloud credentials
Personal dataEmail addresses, SSNs, passport-like identifiers
Financial dataCredit cards, IBAN-style values

PII actions are policy-driven. An org can block findings outright or convert them into an approval requirement.

Step 4.5: Content safety

Detects basic prompt injection and jailbreak patterns using lightweight regex checks. This is the first line of defense against adversarial inputs.

Step 4.6: Security scanner

Advanced security analysis with multi-signal risk scoring. Goes beyond basic content safety to detect:

  • Instruction override attacks (“ignore previous instructions”)
  • Role hijacking and delimiter injection
  • Data exfiltration signals (system prompt extraction, config leaks)
  • Encoded payloads (base64, hex obfuscation)

Risk levels escalate when multiple signals appear in a single request. See the Security Scanner reference for full details.

Step 5: Model access

Model access combines two controls:

  • Plan-level model entitlement
  • Org-level allowlists in gateway policy

Model checks happen after alias resolution.

Example allowlist:

{ "allowed_models": [ "gpt-4o-mini", "claude-haiku-3-5-20241022", "gemini-2.5-flash" ] }

Step 6: HITL gate

The human-in-the-loop gate turns an otherwise valid request into a pending approval when the estimated cost exceeds the configured threshold.

HTTP/1.1 202 Accepted X-CM-Approval-ID: apr_abc123def456 Retry-After: 30
{ "status": "pending_approval", "approval_id": "apr_abc123def456", "message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00", "retry_after_seconds": 30, "estimated_cost": 12.50 }

Pending approvals appear in the dashboard approval queue and can be approved or rejected there before the caller retries the request.

Default governance tiers

The gateway falls back to built-in defaults when an org has no custom policy:

TierRPMDaily BudgetMonthly BudgetMax Cost/RequestHITL ThresholdMax Reasoning TokensMax Body Size
Free10$10$50$0.25$14,0961 MB
Starter60$25$250$0.50$316,38410 MB
Growth300$100$2,000$2.00$1065,53650 MB
Enterprise5,000$2,000$50,000$10.00$50Unlimited100 MB

Backward-compatibility aliases still exist for older tier names such as pro, professional, and team.

Short-circuit behavior

The governance chain is designed to fail fast. The first denial terminates evaluation.

  • If rate limiting blocks a request, cost estimation is never performed
  • If cost governance blocks a request, the provider never receives the call
  • Headers from completed checks are still carried into blocked and approval responses so clients get rate-limit and spend context

Backend implementation

The governance engine is implemented in services/backend/src/gateway/governance.py as the GovernanceEngine class. It exposes a single evaluate() method that accepts a GatewayRequest and GovernancePolicy and returns a PolicyDecision.

Key source files:

FilePurpose
src/gateway/governance.pyGovernance engine with all 13 check implementations
src/gateway/models.pyGatewayRequest, GovernancePolicy, PolicyDecision models
src/gateway/model_pricing.pyToken cost estimation and model pricing tables
src/gateway/content_safety.pyContent safety scanner (prompt injection, jailbreak detection)
src/gateway/security_scanner.pyAdvanced security scanner (injection, exfiltration, encoded payloads)
src/gateway/plan_enforcement.pyPlan-tier enforcement and entitlement checks
src/services/pii_detection_service.pyPresidio NER integration for enhanced PII detection