Governance Chain

The governance chain is the policy pipeline that evaluates every request before it reaches the upstream LLM provider. Checks run in order and short-circuit on the first denial.

Pipeline overview

The chain has 13 steps across 7 stages. Sub-steps (0.5, 1.5, etc.) were added as the governance engine matured.


Request
  --> [0.  Plan Enforcement]         subscription status, quota, model access
  --> [0.5 Body Size Limit]          request body size vs tier limit
  --> [1.  Rate Limit]               requests per minute (sliding window)
  --> [1.5 Plan Entitlement]         legacy daily request/budget check
  --> [1.7 Reasoning Token Cap]      cap max reasoning/thinking tokens
  --> [2.  Cost Estimate]            estimated cost vs per-request + daily budget
  --> [2.5 Hierarchical Budget]      Org -> Team -> Key budget hierarchy
  --> [3.  Runner Session Budget]    per-session cost limit for managed runners
  --> [4.  PII Scan]                 Presidio NER + regex for secrets and PII
  --> [4.5 Content Safety]           prompt injection / jailbreak detection
  --> [4.6 Security Scanner]         advanced injection, exfiltration, encoded payloads
  --> [5.  Model Allowlist]          model allowlist enforcement
  --> [6.  HITL Gate]                flag high-cost requests for human approval
  --> Proxy

Each step returns one of three actions:

Action	Meaning
`ALLOW`	Request passes this check, proceed to the next step
`BLOCK`	Request is denied immediately with an error response
`NEEDS_APPROVAL`	Request is held for human review (HITL gate only)

What can happen

Outcome	Typical status
Rate-limit denial	`429`
Plan-tier denial	`429`
Cost, budget, PII, content safety, runner session, or model denial	`403`
Human approval required	`202`

Step 0: Plan enforcement

Plan enforcement runs before the rest of governance. It checks whether the org’s subscription is active, whether the daily request quota for the plan has been reached, and whether the requested model is available on that plan.

This is separate from org-level gateway policy. Plan enforcement answers whether an org can use a class of service at all. Governance policy answers what custom controls apply to that org’s traffic.

Step 0.5: Body size limit

Rejects requests whose body exceeds a tier-specific or per-org byte limit before any expensive processing occurs. Free tier allows 1 MB, Starter 10 MB, Growth 50 MB, Enterprise 100 MB.

Step 1: Rate limiting

Rate limiting enforces a maximum number of requests per minute per organization.

Each request increments a per-org, per-minute counter in Redis
If the counter exceeds the org’s RPM limit, the request is blocked with 429
The counter key has a 120-second TTL to cover minute boundaries
Rate-limit metadata is included in response headers whether the request is allowed or blocked


HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708642860

Step 1.5: Plan entitlement

A legacy daily request/budget check using billing_prices configuration. This step exists for backward compatibility with older billing integrations and may be removed in a future release.

Step 1.7: Reasoning token cap

Enforces a maximum number of reasoning/thinking tokens per request. This prevents models with extended thinking (e.g., Claude with adaptive thinking, OpenAI o1/o3) from consuming excessive tokens. Limits are tier-based: Free 4,096, Starter 16,384, Growth 65,536, Enterprise unlimited.

Step 2: Cost and budget checks

Before proxying, the gateway estimates request cost and checks it against:

Per-request cost ceilings
Daily budget
Monthly budget

It also returns spend metadata in X-CM-Cost, X-CM-Daily-Cost, and X-CM-Daily-Budget.

Example blocked response:


{
  "error": {
    "code": "daily_budget",
    "message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit"
  }
}

Step 2.5: Hierarchical budget

Enforces a three-level budget hierarchy: Organization -> Team -> API Key. Each level can have its own daily/monthly budget. A request is blocked if any level in the hierarchy is over budget. This allows teams within an organization to have independent spending limits.

Step 3: Runner session budget

When a request is tied to a managed runner session, the gateway can apply an additional per-session cost check. This prevents long-running sessions from silently burning through budget after the request already passed general org policy.

Step 4: PII scan

The gateway scans request text for secrets, credentials, and regulated data before it leaves your infrastructure.

Category	Types
Secrets	API keys, bearer tokens, passwords, cloud credentials
Personal data	Email addresses, SSNs, passport-like identifiers
Financial data	Credit cards, IBAN-style values

PII actions are policy-driven. An org can block findings outright or convert them into an approval requirement.

Step 4.5: Content safety

Detects basic prompt injection and jailbreak patterns using lightweight regex checks. This is the first line of defense against adversarial inputs.

Step 4.6: Security scanner

Advanced security analysis with multi-signal risk scoring. Goes beyond basic content safety to detect:

Instruction override attacks (“ignore previous instructions”)
Role hijacking and delimiter injection
Data exfiltration signals (system prompt extraction, config leaks)
Encoded payloads (base64, hex obfuscation)

Risk levels escalate when multiple signals appear in a single request. See the Security Scanner reference for full details.

Step 5: Model access

Model access combines two controls:

Plan-level model entitlement
Org-level allowlists in gateway policy

Model checks happen after alias resolution.

Example allowlist:


{
  "allowed_models": [
    "gpt-4o-mini",
    "claude-haiku-3-5-20241022",
    "gemini-2.5-flash"
  ]
}

Step 6: HITL gate

The human-in-the-loop gate turns an otherwise valid request into a pending approval when the estimated cost exceeds the configured threshold.


HTTP/1.1 202 Accepted
X-CM-Approval-ID: apr_abc123def456
Retry-After: 30


{
  "status": "pending_approval",
  "approval_id": "apr_abc123def456",
  "message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00",
  "retry_after_seconds": 30,
  "estimated_cost": 12.50
}

Pending approvals appear in the dashboard approval queue and can be approved or rejected there before the caller retries the request.

Default governance tiers

The gateway falls back to built-in defaults when an org has no custom policy:

Tier	RPM	Daily Budget	Monthly Budget	Max Cost/Request	HITL Threshold	Max Reasoning Tokens	Max Body Size
Free	10	$10	$50	$0.25	$1	4,096	1 MB
Starter	60	$25	$250	$0.50	$3	16,384	10 MB
Growth	300	$100	$2,000	$2.00	$10	65,536	50 MB
Enterprise	5,000	$2,000	$50,000	$10.00	$50	Unlimited	100 MB

Backward-compatibility aliases still exist for older tier names such as pro, professional, and team.

Short-circuit behavior

The governance chain is designed to fail fast. The first denial terminates evaluation.

If rate limiting blocks a request, cost estimation is never performed
If cost governance blocks a request, the provider never receives the call
Headers from completed checks are still carried into blocked and approval responses so clients get rate-limit and spend context

Backend implementation

The governance engine is implemented in services/backend/src/gateway/governance.py as the GovernanceEngine class. It exposes a single evaluate() method that accepts a GatewayRequest and GovernancePolicy and returns a PolicyDecision.

Key source files:

File	Purpose
`src/gateway/governance.py`	Governance engine with all 13 check implementations
`src/gateway/models.py`	`GatewayRequest`, `GovernancePolicy`, `PolicyDecision` models
`src/gateway/model_pricing.py`	Token cost estimation and model pricing tables
`src/gateway/content_safety.py`	Content safety scanner (prompt injection, jailbreak detection)
`src/gateway/security_scanner.py`	Advanced security scanner (injection, exfiltration, encoded payloads)
`src/gateway/plan_enforcement.py`	Plan-tier enforcement and entitlement checks
`src/services/pii_detection_service.py`	Presidio NER integration for enhanced PII detection