Governance Chain
The governance chain is the policy pipeline that evaluates every request before it reaches the upstream LLM provider. Checks run in order and short-circuit on the first denial.
Pipeline overview
Request
--> [0. Plan Enforcement]
--> [1. Rate Limit]
--> [2. Cost + Budget]
--> [3. Runner Session Budget]
--> [4. PII + Content Safety]
--> [5. Model Access]
--> [6. HITL Gate]
--> ProxyEach step returns one of three actions:
| Action | Meaning |
|---|---|
ALLOW | Request passes this check, proceed to the next step |
BLOCK | Request is denied immediately with an error response |
NEEDS_APPROVAL | Request is held for human review (HITL gate only) |
What can happen
| Outcome | Typical status |
|---|---|
| Rate-limit denial | 429 |
| Plan-tier denial | 429 |
| Cost, budget, PII, content safety, runner session, or model denial | 403 |
| Human approval required | 202 |
Step 0: Plan enforcement
Plan enforcement runs before the rest of governance. It checks whether the org’s subscription is active, whether the daily request quota for the plan has been reached, and whether the requested model is available on that plan.
This is separate from org-level gateway policy. Plan enforcement answers whether an org can use a class of service at all. Governance policy answers what custom controls apply to that org’s traffic.
Step 1: Rate limiting
Rate limiting enforces a maximum number of requests per minute per organization.
- Each request increments a per-org, per-minute counter in Redis
- If the counter exceeds the org’s RPM limit, the request is blocked with
429 - The counter key has a 120-second TTL to cover minute boundaries
- Rate-limit metadata is included in response headers whether the request is allowed or blocked
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708642860Step 2: Cost and budget checks
Before proxying, the gateway estimates request cost and checks it against:
- Per-request cost ceilings
- Daily budget
- Monthly budget
It also returns spend metadata in X-CM-Cost, X-CM-Daily-Cost, and X-CM-Daily-Budget.
Example blocked response:
{
"error": {
"code": "daily_budget",
"message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit"
}
}Step 3: Runner session budget
When a request is tied to a managed runner session, the gateway can apply an additional per-session cost check. This prevents long-running sessions from silently burning through budget after the request already passed general org policy.
Step 4: PII and content safety
The gateway scans request text before it leaves your infrastructure.
PII scanning focuses on secrets, credentials, and regulated data. Content safety focuses on prompt injection, jailbreak patterns, and exfiltration signals when the relevant guardrails are enabled.
Typical findings include:
| Category | Types |
|---|---|
| Secrets | API keys, bearer tokens, passwords, cloud credentials |
| Personal data | Email addresses, SSNs, passport-like identifiers |
| Financial data | Credit cards, IBAN-style values |
| Safety signals | Prompt injection, jailbreak, exfiltration attempts |
PII actions are policy-driven. An org can block findings outright or convert them into an approval requirement.
Step 5: Model access
Model access combines two controls:
- Plan-level model entitlement
- Org-level allowlists in gateway policy
Model checks happen after alias resolution.
Example allowlist:
{
"allowed_models": [
"gpt-4o-mini",
"claude-haiku-3-5-20241022",
"gemini-2.5-flash"
]
}Step 6: HITL gate
The human-in-the-loop gate turns an otherwise valid request into a pending approval when the estimated cost exceeds the configured threshold.
HTTP/1.1 202 Accepted
X-CM-Approval-ID: apr_abc123def456
Retry-After: 30{
"status": "pending_approval",
"approval_id": "apr_abc123def456",
"message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00",
"retry_after_seconds": 30,
"estimated_cost": 12.50
}Pending approvals appear in the dashboard approval queue and can be approved or rejected there before the caller retries the request.
Default governance tiers
The gateway falls back to built-in defaults when an org has no custom policy:
| Tier | RPM | Daily Budget | Monthly Budget | Max Cost/Request | HITL Threshold |
|---|---|---|---|---|---|
| Free | 10 | $5 | $50 | $0.25 | $1 |
| Starter | 60 | $25 | $250 | $0.50 | $3 |
| Growth | 300 | $100 | $2,000 | $2.00 | $10 |
| Enterprise | 5,000 | $2,000 | $50,000 | $10.00 | $50 |
Backward-compatibility aliases still exist for older tier names such as pro, professional, and team.
Short-circuit behavior
The governance chain is designed to fail fast. The first denial terminates evaluation.
- If rate limiting blocks a request, cost estimation is never performed
- If cost governance blocks a request, the provider never receives the call
- Headers from completed checks are still carried into blocked and approval responses so clients get rate-limit and spend context
Backend implementation
The governance engine is implemented in services/backend/src/gateway/governance.py as the GovernanceEngine class. It exposes a single evaluate() method that accepts a GatewayRequest and GovernancePolicy and returns a PolicyDecision.
Key source files:
| File | Purpose |
|---|---|
src/gateway/governance.py | Governance engine with all check implementations |
src/gateway/models.py | GatewayRequest, GovernancePolicy, PolicyDecision models |
src/gateway/model_pricing.py | Token cost estimation and model pricing tables |
src/gateway/content_safety.py | Content safety scanner (prompt injection, jailbreak detection) |
src/services/pii_detection_service.py | Presidio NER integration for enhanced PII detection |