Governance Reference
Every request that passes through the gateway is evaluated by the governance chain — a multi-step policy pipeline that runs in order and short-circuits on the first denial. This page documents each governance step, its behavior, and the resulting error responses.
Pipeline overview
Request --> [0. Plan Enforcement]
--> [1. Rate Limit]
--> [1.5. Plan Entitlement]
--> [2. Cost Estimate]
--> [3. Runner Session Budget]
--> [4. PII Scan]
--> [4.5. Content Safety]
--> [5. Model Allowlist]
--> [6. HITL Gate]
--> Proxy to ProviderEach step returns one of three actions:
| Action | Meaning |
|---|---|
ALLOW | Request passes this check, proceed to the next step |
BLOCK | Request denied immediately with an error response |
NEEDS_APPROVAL | Request held for human review (HITL gate only) |
The chain is ordered by computational cost — cheapest checks run first to fail fast and avoid unnecessary work.
Step 0: Plan enforcement
Validates the organization’s subscription status before any other checks.
What it checks:
- Whether the organization has an active subscription
- Whether the org has exceeded its plan’s daily request quota
- Whether the requested model is available on the org’s plan tier
Blocked response:
HTTP/1.1 429 Too Many Requests{
"error": {
"message": "Plan limit exceeded: free tier allows 100 requests per day",
"type": "insufficient_quota",
"param": null,
"code": "plan_limit_exceeded"
}
}Step 1: Rate limiting
Enforces a maximum number of requests per minute (RPM) per organization using a Redis sliding window.
How it works:
- Each request increments a per-org, per-minute counter in Redis
- If the counter exceeds the org’s RPM limit, the request is blocked with HTTP 429
- The counter key uses a 120-second TTL to cover minute boundaries
- Rate limit metadata is included in response headers regardless of the outcome
Default RPM limits by tier:
| Tier | Requests per minute |
|---|---|
| Free | 10 |
| Starter | 60 |
| Pro | 300 |
| Team | 1,000 |
| Enterprise | 5,000 |
Blocked response:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709056860
Retry-After: 12{
"error": {
"message": "Rate limit exceeded. Retry after 12 seconds.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit"
}
}Response headers (always included):
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum RPM for the organization |
X-RateLimit-Remaining | Remaining requests in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds until the limit resets (only on 429 responses) |
Step 2: Cost estimation
Estimates the request cost before proxying and checks it against budget limits.
How it works:
- Input tokens are counted using tiktoken BPE encoding with the correct encoding for the model family (
o200k_basefor GPT-4o/GPT-5,cl100k_basefor Claude/DeepSeek/Gemini) - Max output tokens are taken from the request body (
max_tokensormax_completion_tokens, defaulting to 4096) - Estimated cost is calculated from the model’s per-token pricing table
- The estimate is checked against three limits:
- Per-request cap: blocks individual requests estimated to be too expensive
- Daily budget: blocks when daily accumulated spend plus the estimate exceeds the limit
- Monthly budget: blocks when monthly accumulated spend plus the estimate exceeds the limit
If tiktoken encoding fails, the estimator falls back to a character heuristic (1 token per 4 characters).
Budget limits by tier:
| Tier | Max cost per request | Daily budget | Monthly budget |
|---|---|---|---|
| Free | $0.25 | $5 | $50 |
| Starter | $0.50 | $25 | $250 |
| Pro | $2.00 | $100 | $2,000 |
| Team | $5.00 | $500 | $10,000 |
| Enterprise | $10.00 | $2,000 | $50,000 |
Budget tracking in Redis:
| Key format | TTL | Purpose |
|---|---|---|
gateway:daily_cost:{org_id}:{YYYY-MM-DD} | 48 hours | Daily cost accumulator |
gateway:monthly_cost:{org_id}:{YYYY-MM} | 35 days | Monthly cost accumulator |
Blocked response (per-request cap):
HTTP/1.1 403 Forbidden{
"error": {
"message": "Estimated cost $3.50 exceeds per-request limit $2.00",
"type": "permission_error",
"param": null,
"code": "cost_limit"
}
}Blocked response (daily budget):
HTTP/1.1 403 Forbidden{
"error": {
"message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit",
"type": "permission_error",
"param": null,
"code": "daily_budget"
}
}Blocked response (monthly budget):
{
"error": {
"message": "Monthly budget exhausted: $248.00 spent + $2.10 estimated > $250.00 limit",
"type": "permission_error",
"param": null,
"code": "monthly_budget"
}
}Cost metadata headers (always included):
| Header | Description |
|---|---|
X-CM-Cost | Estimated cost for this request (USD) |
X-CM-Daily-Cost | Cumulative daily spend (USD) |
X-CM-Daily-Budget | Daily budget limit (USD) |
Step 3: Runner session budget
For requests originating from managed runner sessions, this step enforces per-session cost limits separate from the org-level budgets.
When it runs: Only when the request includes valid X-Runner-ID and X-Session-ID headers with a trusted X-Runner-Proof token.
What it checks:
- Whether the session’s accumulated cost plus the estimated cost exceeds the session-level budget
Step 4: PII scanning
Scans all user-provided text in the request body for secrets, credentials, and personal data before it leaves your infrastructure.
Detection modes:
| Mode | Description | Feature flag |
|---|---|---|
| Regex (default) | Compiled regex patterns for common PII types | Always enabled |
| Presidio NER | Microsoft Presidio Named Entity Recognition for 50+ entity types | Feature-flagged |
PII types detected by regex:
| Category | Types |
|---|---|
| Credentials | OpenAI API keys (sk-), Anthropic API keys (sk-ant-), Curate-Me API keys (cm_sk_), Bearer tokens, AWS access keys, generic secret patterns |
| Personal identifiers | Social Security numbers (SSN), email addresses |
| Financial | Credit card numbers (Visa, MC, Amex, Discover), IBAN numbers |
| EU compliance | VAT numbers, EU passport numbers, UK National Insurance numbers, German ID numbers |
| Health data | ICD-10 diagnostic codes, medication dosage patterns |
Severity classification:
| Severity | Examples | Default action |
|---|---|---|
| CRITICAL | SSN, credit card, IBAN, passport numbers | Block |
| HIGH | API keys, bearer tokens, AWS credentials | Block |
| MEDIUM | Email addresses, VAT numbers, health codes | Flag (log warning) |
Configurable behavior:
Per-organization PII policy can be set to:
- Block the request (default for CRITICAL and HIGH findings)
- Flag the request for review
- Allow the request with a warning logged
Blocked response:
HTTP/1.1 403 Forbidden{
"error": {
"message": "PII detected in request: credit card number found in message content",
"type": "permission_error",
"param": null,
"code": "pii_detected"
}
}Step 4.5: Content safety (supplementary)
When the DLP_GUARDRAILS feature flag is enabled, the gateway runs a content safety scanner between PII scanning and model allowlists.
What it detects:
- Prompt injection attempts
- Jailbreak patterns
- Data exfiltration indicators
Behavior:
- High severity findings (confirmed prompt injection) result in a block
- Medium severity findings are logged but allowed through
This step supplements the core governance chain with additional input validation.
Step 5: Model allowlist
Restricts which LLM models an organization can use through the gateway.
How it works:
- If the organization has a non-empty
allowed_modelslist, the requested model must appear in that list - If the list is empty, all models are allowed (no restriction)
- Model names are checked after alias resolution, so aliases like
claude-sonnetare resolved to their canonical names before comparison
Example policy:
{
"allowed_models": [
"gpt-4o-mini",
"claude-haiku-3-5-20241022",
"gemini-2.5-flash"
]
}With this policy, requests for gpt-4o, claude-sonnet-4-20250514, or any unlisted model would be blocked.
Blocked response:
HTTP/1.1 403 Forbidden{
"error": {
"message": "Model 'gpt-4o' is not in the allowed model list for this organization",
"type": "permission_error",
"param": null,
"code": "model_not_allowed"
}
}Step 6: HITL gate
The Human-in-the-Loop gate flags expensive or sensitive requests for manual approval before execution. This is the final check in the governance chain.
How it works:
- The gateway compares the estimated request cost against the org’s HITL cost threshold
- If the estimate exceeds the threshold, the request is not proxied immediately
- An approval request is created in MongoDB via the HITL bridge
- The gateway returns HTTP 202 with an
approval_idthat the client can poll - A reviewer in the dashboard approves or rejects the request
- Once approved, the client re-submits the request (the approval is consumed)
Default HITL thresholds by tier:
| Tier | HITL cost threshold |
|---|---|
| Free | $1.00 |
| Starter | $3.00 |
| Pro | $10.00 |
| Team | $25.00 |
| Enterprise | $50.00 |
HITL response (HTTP 202):
HTTP/1.1 202 Accepted
X-CM-Approval-ID: apr_abc123def456
Retry-After: 30{
"status": "pending_approval",
"approval_id": "apr_abc123def456",
"retry_after_seconds": 30,
"message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00",
"estimated_cost": 12.50,
"model": "gpt-4o"
}Polling for approval status:
GET /v1/approvals/apr_abc123def456/status{
"approval_id": "apr_abc123def456",
"status": "approved",
"approved_by": "user@example.com",
"approved_at": "2026-02-27T12:05:00Z"
}Possible status values: pending, approved, rejected, expired.
Dashboard integration:
Pending approvals appear in the dashboard under Gateway > Approval Queues. Reviewers can see the model, estimated cost, and request content before approving or rejecting.
Admin endpoints for managing approvals:
| Method | Endpoint | Description |
|---|---|---|
GET | /gateway/admin/approvals | List pending approvals (paginated) |
GET | /gateway/admin/approvals/stats | Counts of pending/approved/rejected |
GET | /gateway/admin/approvals/{id} | Get full approval details |
POST | /gateway/admin/approvals/{id}/approve | Approve a request |
POST | /gateway/admin/approvals/{id}/reject | Reject a request with reason |
Short-circuit behavior
The governance chain fails fast. Each check is evaluated in order, and the first denial terminates the chain:
- If rate limiting blocks a request, cost estimation is never performed
- If cost estimation blocks, PII scanning is skipped
- Metadata from earlier steps (rate limit counters, daily cost) is carried forward into the denial response so clients always receive complete headers
This ordering is intentional:
- Plan enforcement first — ensures subscription is active
- Rate limiting — cheapest check (single Redis increment), prevents abuse
- Cost estimation — requires token counting but avoids scanning content
- PII scanning — requires text extraction and pattern matching
- Content safety — optional DLP scanning
- Model allowlist — simple string comparison
- HITL gate last — only reached for requests that pass all automated checks
Custom governance policies
Governance policies can be customized per organization via the dashboard or admin API.
Create or update a policy
POST /gateway/admin/policies{
"org_id": "org_abc123",
"rpm_limit": 500,
"daily_budget": 200.0,
"monthly_budget": 5000.0,
"max_cost_per_request": 3.0,
"hitl_cost_threshold": 15.0,
"allowed_models": ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"]
}Policy presets
The gateway includes built-in presets that can be applied in one step:
POST /gateway/admin/policies/apply-preset{
"preset": "team"
}Available presets: free, starter, pro, team, enterprise.
Retrieving policies
GET /gateway/admin/policiesReturns the effective governance policy for the authenticated organization, including any custom overrides and the tier-based defaults.