Governance Chain
The governance chain is the policy pipeline that evaluates every request before it reaches the upstream LLM provider. Checks run in order and short-circuit on the first denial.
Pipeline overview
The chain has 13 steps across 7 stages. Sub-steps (0.5, 1.5, etc.) were added as the governance engine matured.
Request
--> [0. Plan Enforcement] subscription status, quota, model access
--> [0.5 Body Size Limit] request body size vs tier limit
--> [1. Rate Limit] requests per minute (sliding window)
--> [1.5 Plan Entitlement] legacy daily request/budget check
--> [1.7 Reasoning Token Cap] cap max reasoning/thinking tokens
--> [2. Cost Estimate] estimated cost vs per-request + daily budget
--> [2.5 Hierarchical Budget] Org -> Team -> Key budget hierarchy
--> [3. Runner Session Budget] per-session cost limit for managed runners
--> [4. PII Scan] Presidio NER + regex for secrets and PII
--> [4.5 Content Safety] prompt injection / jailbreak detection
--> [4.6 Security Scanner] advanced injection, exfiltration, encoded payloads
--> [5. Model Allowlist] model allowlist enforcement
--> [6. HITL Gate] flag high-cost requests for human approval
--> ProxyEach step returns one of three actions:
| Action | Meaning |
|---|---|
ALLOW | Request passes this check, proceed to the next step |
BLOCK | Request is denied immediately with an error response |
NEEDS_APPROVAL | Request is held for human review (HITL gate only) |
What can happen
| Outcome | Typical status |
|---|---|
| Rate-limit denial | 429 |
| Plan-tier denial | 429 |
| Cost, budget, PII, content safety, runner session, or model denial | 403 |
| Human approval required | 202 |
Step 0: Plan enforcement
Plan enforcement runs before the rest of governance. It checks whether the org’s subscription is active, whether the daily request quota for the plan has been reached, and whether the requested model is available on that plan.
This is separate from org-level gateway policy. Plan enforcement answers whether an org can use a class of service at all. Governance policy answers what custom controls apply to that org’s traffic.
Step 0.5: Body size limit
Rejects requests whose body exceeds a tier-specific or per-org byte limit before any expensive processing occurs. Free tier allows 1 MB, Starter 10 MB, Growth 50 MB, Enterprise 100 MB.
Step 1: Rate limiting
Rate limiting enforces a maximum number of requests per minute per organization.
- Each request increments a per-org, per-minute counter in Redis
- If the counter exceeds the org’s RPM limit, the request is blocked with
429 - The counter key has a 120-second TTL to cover minute boundaries
- Rate-limit metadata is included in response headers whether the request is allowed or blocked
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708642860Step 1.5: Plan entitlement
A legacy daily request/budget check using billing_prices configuration. This step exists for backward compatibility with older billing integrations and may be removed in a future release.
Step 1.7: Reasoning token cap
Enforces a maximum number of reasoning/thinking tokens per request. This prevents models with extended thinking (e.g., Claude with adaptive thinking, OpenAI o1/o3) from consuming excessive tokens. Limits are tier-based: Free 4,096, Starter 16,384, Growth 65,536, Enterprise unlimited.
Step 2: Cost and budget checks
Before proxying, the gateway estimates request cost and checks it against:
- Per-request cost ceilings
- Daily budget
- Monthly budget
It also returns spend metadata in X-CM-Cost, X-CM-Daily-Cost, and X-CM-Daily-Budget.
Example blocked response:
{
"error": {
"code": "daily_budget",
"message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit"
}
}Step 2.5: Hierarchical budget
Enforces a three-level budget hierarchy: Organization -> Team -> API Key. Each level can have its own daily/monthly budget. A request is blocked if any level in the hierarchy is over budget. This allows teams within an organization to have independent spending limits.
Step 3: Runner session budget
When a request is tied to a managed runner session, the gateway can apply an additional per-session cost check. This prevents long-running sessions from silently burning through budget after the request already passed general org policy.
Step 4: PII scan
The gateway scans request text for secrets, credentials, and regulated data before it leaves your infrastructure.
| Category | Types |
|---|---|
| Secrets | API keys, bearer tokens, passwords, cloud credentials |
| Personal data | Email addresses, SSNs, passport-like identifiers |
| Financial data | Credit cards, IBAN-style values |
PII actions are policy-driven. An org can block findings outright or convert them into an approval requirement.
Step 4.5: Content safety
Detects basic prompt injection and jailbreak patterns using lightweight regex checks. This is the first line of defense against adversarial inputs.
Step 4.6: Security scanner
Advanced security analysis with multi-signal risk scoring. Goes beyond basic content safety to detect:
- Instruction override attacks (“ignore previous instructions”)
- Role hijacking and delimiter injection
- Data exfiltration signals (system prompt extraction, config leaks)
- Encoded payloads (base64, hex obfuscation)
Risk levels escalate when multiple signals appear in a single request. See the Security Scanner reference for full details.
Step 5: Model access
Model access combines two controls:
- Plan-level model entitlement
- Org-level allowlists in gateway policy
Model checks happen after alias resolution.
Example allowlist:
{
"allowed_models": [
"gpt-4o-mini",
"claude-haiku-3-5-20241022",
"gemini-2.5-flash"
]
}Step 6: HITL gate
The human-in-the-loop gate turns an otherwise valid request into a pending approval when the estimated cost exceeds the configured threshold.
HTTP/1.1 202 Accepted
X-CM-Approval-ID: apr_abc123def456
Retry-After: 30{
"status": "pending_approval",
"approval_id": "apr_abc123def456",
"message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00",
"retry_after_seconds": 30,
"estimated_cost": 12.50
}Pending approvals appear in the dashboard approval queue and can be approved or rejected there before the caller retries the request.
Default governance tiers
The gateway falls back to built-in defaults when an org has no custom policy:
| Tier | RPM | Daily Budget | Monthly Budget | Max Cost/Request | HITL Threshold | Max Reasoning Tokens | Max Body Size |
|---|---|---|---|---|---|---|---|
| Free | 10 | $10 | $50 | $0.25 | $1 | 4,096 | 1 MB |
| Starter | 60 | $25 | $250 | $0.50 | $3 | 16,384 | 10 MB |
| Growth | 300 | $100 | $2,000 | $2.00 | $10 | 65,536 | 50 MB |
| Enterprise | 5,000 | $2,000 | $50,000 | $10.00 | $50 | Unlimited | 100 MB |
Backward-compatibility aliases still exist for older tier names such as pro, professional, and team.
Short-circuit behavior
The governance chain is designed to fail fast. The first denial terminates evaluation.
- If rate limiting blocks a request, cost estimation is never performed
- If cost governance blocks a request, the provider never receives the call
- Headers from completed checks are still carried into blocked and approval responses so clients get rate-limit and spend context
Backend implementation
The governance engine is implemented in services/backend/src/gateway/governance.py as the GovernanceEngine class. It exposes a single evaluate() method that accepts a GatewayRequest and GovernancePolicy and returns a PolicyDecision.
Key source files:
| File | Purpose |
|---|---|
src/gateway/governance.py | Governance engine with all 13 check implementations |
src/gateway/models.py | GatewayRequest, GovernancePolicy, PolicyDecision models |
src/gateway/model_pricing.py | Token cost estimation and model pricing tables |
src/gateway/content_safety.py | Content safety scanner (prompt injection, jailbreak detection) |
src/gateway/security_scanner.py | Advanced security scanner (injection, exfiltration, encoded payloads) |
src/gateway/plan_enforcement.py | Plan-tier enforcement and entitlement checks |
src/services/pii_detection_service.py | Presidio NER integration for enhanced PII detection |