Skip to Content
GatewayAPI ReferenceGovernance Reference

Governance Reference

Every request that passes through the gateway is evaluated by the governance chain — a multi-step policy pipeline that runs in order and short-circuits on the first denial. This page documents each governance step, its behavior, and the resulting error responses.

Pipeline overview

Request --> [0. Plan Enforcement] --> [1. Rate Limit] --> [1.5. Plan Entitlement] --> [2. Cost Estimate] --> [3. Runner Session Budget] --> [4. PII Scan] --> [4.5. Content Safety] --> [5. Model Allowlist] --> [6. HITL Gate] --> Proxy to Provider

Each step returns one of three actions:

ActionMeaning
ALLOWRequest passes this check, proceed to the next step
BLOCKRequest denied immediately with an error response
NEEDS_APPROVALRequest held for human review (HITL gate only)

The chain is ordered by computational cost — cheapest checks run first to fail fast and avoid unnecessary work.


Step 0: Plan enforcement

Validates the organization’s subscription status before any other checks.

What it checks:

  • Whether the organization has an active subscription
  • Whether the org has exceeded its plan’s daily request quota
  • Whether the requested model is available on the org’s plan tier

Blocked response:

HTTP/1.1 429 Too Many Requests
{ "error": { "message": "Plan limit exceeded: free tier allows 100 requests per day", "type": "insufficient_quota", "param": null, "code": "plan_limit_exceeded" } }

Step 1: Rate limiting

Enforces a maximum number of requests per minute (RPM) per organization using a Redis sliding window.

How it works:

  • Each request increments a per-org, per-minute counter in Redis
  • If the counter exceeds the org’s RPM limit, the request is blocked with HTTP 429
  • The counter key uses a 120-second TTL to cover minute boundaries
  • Rate limit metadata is included in response headers regardless of the outcome

Default RPM limits by tier:

TierRequests per minute
Free10
Starter60
Pro300
Team1,000
Enterprise5,000

Blocked response:

HTTP/1.1 429 Too Many Requests X-RateLimit-Limit: 60 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1709056860 Retry-After: 12
{ "error": { "message": "Rate limit exceeded. Retry after 12 seconds.", "type": "rate_limit_error", "param": null, "code": "rate_limit" } }

Response headers (always included):

HeaderDescription
X-RateLimit-LimitMaximum RPM for the organization
X-RateLimit-RemainingRemaining requests in the current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds until the limit resets (only on 429 responses)

Step 2: Cost estimation

Estimates the request cost before proxying and checks it against budget limits.

How it works:

  1. Input tokens are counted using tiktoken BPE encoding with the correct encoding for the model family (o200k_base for GPT-4o/GPT-5, cl100k_base for Claude/DeepSeek/Gemini)
  2. Max output tokens are taken from the request body (max_tokens or max_completion_tokens, defaulting to 4096)
  3. Estimated cost is calculated from the model’s per-token pricing table
  4. The estimate is checked against three limits:
    • Per-request cap: blocks individual requests estimated to be too expensive
    • Daily budget: blocks when daily accumulated spend plus the estimate exceeds the limit
    • Monthly budget: blocks when monthly accumulated spend plus the estimate exceeds the limit

If tiktoken encoding fails, the estimator falls back to a character heuristic (1 token per 4 characters).

Budget limits by tier:

TierMax cost per requestDaily budgetMonthly budget
Free$0.25$5$50
Starter$0.50$25$250
Pro$2.00$100$2,000
Team$5.00$500$10,000
Enterprise$10.00$2,000$50,000

Budget tracking in Redis:

Key formatTTLPurpose
gateway:daily_cost:{org_id}:{YYYY-MM-DD}48 hoursDaily cost accumulator
gateway:monthly_cost:{org_id}:{YYYY-MM}35 daysMonthly cost accumulator

Blocked response (per-request cap):

HTTP/1.1 403 Forbidden
{ "error": { "message": "Estimated cost $3.50 exceeds per-request limit $2.00", "type": "permission_error", "param": null, "code": "cost_limit" } }

Blocked response (daily budget):

HTTP/1.1 403 Forbidden
{ "error": { "message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit", "type": "permission_error", "param": null, "code": "daily_budget" } }

Blocked response (monthly budget):

{ "error": { "message": "Monthly budget exhausted: $248.00 spent + $2.10 estimated > $250.00 limit", "type": "permission_error", "param": null, "code": "monthly_budget" } }

Cost metadata headers (always included):

HeaderDescription
X-CM-CostEstimated cost for this request (USD)
X-CM-Daily-CostCumulative daily spend (USD)
X-CM-Daily-BudgetDaily budget limit (USD)

Step 3: Runner session budget

For requests originating from managed runner sessions, this step enforces per-session cost limits separate from the org-level budgets.

When it runs: Only when the request includes valid X-Runner-ID and X-Session-ID headers with a trusted X-Runner-Proof token.

What it checks:

  • Whether the session’s accumulated cost plus the estimated cost exceeds the session-level budget

Step 4: PII scanning

Scans all user-provided text in the request body for secrets, credentials, and personal data before it leaves your infrastructure.

Detection modes:

ModeDescriptionFeature flag
Regex (default)Compiled regex patterns for common PII typesAlways enabled
Presidio NERMicrosoft Presidio Named Entity Recognition for 50+ entity typesFeature-flagged

PII types detected by regex:

CategoryTypes
CredentialsOpenAI API keys (sk-), Anthropic API keys (sk-ant-), Curate-Me API keys (cm_sk_), Bearer tokens, AWS access keys, generic secret patterns
Personal identifiersSocial Security numbers (SSN), email addresses
FinancialCredit card numbers (Visa, MC, Amex, Discover), IBAN numbers
EU complianceVAT numbers, EU passport numbers, UK National Insurance numbers, German ID numbers
Health dataICD-10 diagnostic codes, medication dosage patterns

Severity classification:

SeverityExamplesDefault action
CRITICALSSN, credit card, IBAN, passport numbersBlock
HIGHAPI keys, bearer tokens, AWS credentialsBlock
MEDIUMEmail addresses, VAT numbers, health codesFlag (log warning)

Configurable behavior:

Per-organization PII policy can be set to:

  • Block the request (default for CRITICAL and HIGH findings)
  • Flag the request for review
  • Allow the request with a warning logged

Blocked response:

HTTP/1.1 403 Forbidden
{ "error": { "message": "PII detected in request: credit card number found in message content", "type": "permission_error", "param": null, "code": "pii_detected" } }

Step 4.5: Content safety (supplementary)

When the DLP_GUARDRAILS feature flag is enabled, the gateway runs a content safety scanner between PII scanning and model allowlists.

What it detects:

  • Prompt injection attempts
  • Jailbreak patterns
  • Data exfiltration indicators

Behavior:

  • High severity findings (confirmed prompt injection) result in a block
  • Medium severity findings are logged but allowed through

This step supplements the core governance chain with additional input validation.


Step 5: Model allowlist

Restricts which LLM models an organization can use through the gateway.

How it works:

  • If the organization has a non-empty allowed_models list, the requested model must appear in that list
  • If the list is empty, all models are allowed (no restriction)
  • Model names are checked after alias resolution, so aliases like claude-sonnet are resolved to their canonical names before comparison

Example policy:

{ "allowed_models": [ "gpt-4o-mini", "claude-haiku-3-5-20241022", "gemini-2.5-flash" ] }

With this policy, requests for gpt-4o, claude-sonnet-4-20250514, or any unlisted model would be blocked.

Blocked response:

HTTP/1.1 403 Forbidden
{ "error": { "message": "Model 'gpt-4o' is not in the allowed model list for this organization", "type": "permission_error", "param": null, "code": "model_not_allowed" } }

Step 6: HITL gate

The Human-in-the-Loop gate flags expensive or sensitive requests for manual approval before execution. This is the final check in the governance chain.

How it works:

  1. The gateway compares the estimated request cost against the org’s HITL cost threshold
  2. If the estimate exceeds the threshold, the request is not proxied immediately
  3. An approval request is created in MongoDB via the HITL bridge
  4. The gateway returns HTTP 202 with an approval_id that the client can poll
  5. A reviewer in the dashboard approves or rejects the request
  6. Once approved, the client re-submits the request (the approval is consumed)

Default HITL thresholds by tier:

TierHITL cost threshold
Free$1.00
Starter$3.00
Pro$10.00
Team$25.00
Enterprise$50.00

HITL response (HTTP 202):

HTTP/1.1 202 Accepted X-CM-Approval-ID: apr_abc123def456 Retry-After: 30
{ "status": "pending_approval", "approval_id": "apr_abc123def456", "retry_after_seconds": 30, "message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00", "estimated_cost": 12.50, "model": "gpt-4o" }

Polling for approval status:

GET /v1/approvals/apr_abc123def456/status
{ "approval_id": "apr_abc123def456", "status": "approved", "approved_by": "user@example.com", "approved_at": "2026-02-27T12:05:00Z" }

Possible status values: pending, approved, rejected, expired.

Dashboard integration:

Pending approvals appear in the dashboard under Gateway > Approval Queues. Reviewers can see the model, estimated cost, and request content before approving or rejecting.

Admin endpoints for managing approvals:

MethodEndpointDescription
GET/gateway/admin/approvalsList pending approvals (paginated)
GET/gateway/admin/approvals/statsCounts of pending/approved/rejected
GET/gateway/admin/approvals/{id}Get full approval details
POST/gateway/admin/approvals/{id}/approveApprove a request
POST/gateway/admin/approvals/{id}/rejectReject a request with reason

Short-circuit behavior

The governance chain fails fast. Each check is evaluated in order, and the first denial terminates the chain:

  • If rate limiting blocks a request, cost estimation is never performed
  • If cost estimation blocks, PII scanning is skipped
  • Metadata from earlier steps (rate limit counters, daily cost) is carried forward into the denial response so clients always receive complete headers

This ordering is intentional:

  1. Plan enforcement first — ensures subscription is active
  2. Rate limiting — cheapest check (single Redis increment), prevents abuse
  3. Cost estimation — requires token counting but avoids scanning content
  4. PII scanning — requires text extraction and pattern matching
  5. Content safety — optional DLP scanning
  6. Model allowlist — simple string comparison
  7. HITL gate last — only reached for requests that pass all automated checks

Custom governance policies

Governance policies can be customized per organization via the dashboard or admin API.

Create or update a policy

POST /gateway/admin/policies
{ "org_id": "org_abc123", "rpm_limit": 500, "daily_budget": 200.0, "monthly_budget": 5000.0, "max_cost_per_request": 3.0, "hitl_cost_threshold": 15.0, "allowed_models": ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"] }

Policy presets

The gateway includes built-in presets that can be applied in one step:

POST /gateway/admin/policies/apply-preset
{ "preset": "team" }

Available presets: free, starter, pro, team, enterprise.

Retrieving policies

GET /gateway/admin/policies

Returns the effective governance policy for the authenticated organization, including any custom overrides and the tier-based defaults.