Governance Reference

Every request that passes through the gateway is evaluated by the governance chain — a multi-step policy pipeline that runs in order and short-circuits on the first denial. This page documents each governance step, its behavior, and the resulting error responses.

Pipeline overview


Request --> [0. Plan Enforcement]
        --> [1. Rate Limit]
        --> [1.5. Plan Entitlement]
        --> [2. Cost Estimate]
        --> [3. Runner Session Budget]
        --> [4. PII Scan]
        --> [4.5. Content Safety]
        --> [5. Model Allowlist]
        --> [6. HITL Gate]
        --> Proxy to Provider

Each step returns one of three actions:

Action	Meaning
`ALLOW`	Request passes this check, proceed to the next step
`BLOCK`	Request denied immediately with an error response
`NEEDS_APPROVAL`	Request held for human review (HITL gate only)

The chain is ordered by computational cost — cheapest checks run first to fail fast and avoid unnecessary work.

Step 0: Plan enforcement

Validates the organization’s subscription status before any other checks.

What it checks:

Whether the organization has an active subscription
Whether the org has exceeded its plan’s daily request quota
Whether the requested model is available on the org’s plan tier

Blocked response:


HTTP/1.1 429 Too Many Requests


{
  "error": {
    "message": "Plan limit exceeded: free tier allows 100 requests per day",
    "type": "insufficient_quota",
    "param": null,
    "code": "plan_limit_exceeded"
  }
}

Step 1: Rate limiting

Enforces a maximum number of requests per minute (RPM) per organization using a Redis sliding window.

How it works:

Each request increments a per-org, per-minute counter in Redis
If the counter exceeds the org’s RPM limit, the request is blocked with HTTP 429
The counter key uses a 120-second TTL to cover minute boundaries
Rate limit metadata is included in response headers regardless of the outcome

Default RPM limits by tier:

Tier	Requests per minute
Free	10
Starter	60
Pro	300
Team	1,000
Enterprise	5,000

Blocked response:


HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709056860
Retry-After: 12


{
  "error": {
    "message": "Rate limit exceeded. Retry after 12 seconds.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit"
  }
}

Response headers (always included):

Header	Description
`X-RateLimit-Limit`	Maximum RPM for the organization
`X-RateLimit-Remaining`	Remaining requests in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds until the limit resets (only on 429 responses)

Step 2: Cost estimation

Estimates the request cost before proxying and checks it against budget limits.

How it works:

Input tokens are counted using tiktoken BPE encoding with the correct encoding for the model family (o200k_base for GPT-4o/GPT-5, cl100k_base for Claude/DeepSeek/Gemini)
Max output tokens are taken from the request body (max_tokens or max_completion_tokens, defaulting to 4096)
Estimated cost is calculated from the model’s per-token pricing table
The estimate is checked against three limits:
- Per-request cap: blocks individual requests estimated to be too expensive
- Daily budget: blocks when daily accumulated spend plus the estimate exceeds the limit
- Monthly budget: blocks when monthly accumulated spend plus the estimate exceeds the limit

If tiktoken encoding fails, the estimator falls back to a character heuristic (1 token per 4 characters).

Budget limits by tier:

Tier	Max cost per request	Daily budget	Monthly budget
Free	$0.25	$5	$50
Starter	$0.50	$25	$250
Pro	$2.00	$100	$2,000
Team	$5.00	$500	$10,000
Enterprise	$10.00	$2,000	$50,000

Budget tracking in Redis:

Key format	TTL	Purpose
`gateway:daily_cost:{org_id}:{YYYY-MM-DD}`	48 hours	Daily cost accumulator
`gateway:monthly_cost:{org_id}:{YYYY-MM}`	35 days	Monthly cost accumulator

Blocked response (per-request cap):


HTTP/1.1 403 Forbidden


{
  "error": {
    "message": "Estimated cost $3.50 exceeds per-request limit $2.00",
    "type": "permission_error",
    "param": null,
    "code": "cost_limit"
  }
}

Blocked response (daily budget):


HTTP/1.1 403 Forbidden


{
  "error": {
    "message": "Daily budget exhausted: $24.50 spent + $0.52 estimated > $25.00 limit",
    "type": "permission_error",
    "param": null,
    "code": "daily_budget"
  }
}

Blocked response (monthly budget):


{
  "error": {
    "message": "Monthly budget exhausted: $248.00 spent + $2.10 estimated > $250.00 limit",
    "type": "permission_error",
    "param": null,
    "code": "monthly_budget"
  }
}

Cost metadata headers (always included):

Header	Description
`X-CM-Cost`	Estimated cost for this request (USD)
`X-CM-Daily-Cost`	Cumulative daily spend (USD)
`X-CM-Daily-Budget`	Daily budget limit (USD)

Step 3: Runner session budget

For requests originating from managed runner sessions, this step enforces per-session cost limits separate from the org-level budgets.

When it runs: Only when the request includes valid X-Runner-ID and X-Session-ID headers with a trusted X-Runner-Proof token.

What it checks:

Whether the session’s accumulated cost plus the estimated cost exceeds the session-level budget

Step 4: PII scanning

Scans all user-provided text in the request body for secrets, credentials, and personal data before it leaves your infrastructure.

Detection modes:

Mode	Description	Feature flag
Regex (default)	Compiled regex patterns for common PII types	Always enabled
Presidio NER	Microsoft Presidio Named Entity Recognition for 50+ entity types	Feature-flagged

PII types detected by regex:

Category	Types
Credentials	OpenAI API keys (`sk-`), Anthropic API keys (`sk-ant-`), Curate-Me API keys (`cm_sk_`), Bearer tokens, AWS access keys, generic secret patterns
Personal identifiers	Social Security numbers (SSN), email addresses
Financial	Credit card numbers (Visa, MC, Amex, Discover), IBAN numbers
EU compliance	VAT numbers, EU passport numbers, UK National Insurance numbers, German ID numbers
Health data	ICD-10 diagnostic codes, medication dosage patterns

Severity classification:

Severity	Examples	Default action
CRITICAL	SSN, credit card, IBAN, passport numbers	Block
HIGH	API keys, bearer tokens, AWS credentials	Block
MEDIUM	Email addresses, VAT numbers, health codes	Flag (log warning)

Configurable behavior:

Per-organization PII policy can be set to:

Block the request (default for CRITICAL and HIGH findings)
Flag the request for review
Allow the request with a warning logged

Blocked response:


HTTP/1.1 403 Forbidden


{
  "error": {
    "message": "PII detected in request: credit card number found in message content",
    "type": "permission_error",
    "param": null,
    "code": "pii_detected"
  }
}

Step 4.5: Content safety (supplementary)

When the CONTENT_SAFETY_SCAN feature flag is enabled (or the legacy DLP_GUARDRAILS umbrella flag), the gateway runs a content safety scanner between PII scanning and model allowlists.

What it detects:

Prompt injection attempts
Jailbreak patterns
Data exfiltration indicators

Behavior:

High severity findings (confirmed prompt injection) result in a block
Medium severity findings are logged but allowed through

This step supplements the core governance chain with additional input validation.

Step 5: Model allowlist

Restricts which LLM models an organization can use through the gateway.

How it works:

If the organization has a non-empty allowed_models list, the requested model must appear in that list
If the list is empty, all models are allowed (no restriction)
Model names are checked after alias resolution, so aliases like claude-sonnet are resolved to their canonical names before comparison

Example policy:


{
  "allowed_models": [
    "gpt-4o-mini",
    "claude-haiku-3-5-20241022",
    "gemini-2.5-flash"
  ]
}

With this policy, requests for gpt-4o, claude-sonnet-4-20250514, or any unlisted model would be blocked.

Blocked response:


HTTP/1.1 403 Forbidden


{
  "error": {
    "message": "Model 'gpt-4o' is not in the allowed model list for this organization",
    "type": "permission_error",
    "param": null,
    "code": "model_not_allowed"
  }
}

Step 6: HITL gate

The Human-in-the-Loop gate flags expensive or sensitive requests for manual approval before execution. This is the final check in the governance chain.

How it works:

The gateway compares the estimated request cost against the org’s HITL cost threshold
If the estimate exceeds the threshold, the request is not proxied immediately
An approval request is created in MongoDB via the HITL bridge
The gateway returns HTTP 202 with an approval_id that the client can poll
A reviewer in the dashboard approves or rejects the request
Once approved, the client re-submits the request (the approval is consumed)

Default HITL thresholds by tier:

Tier	HITL cost threshold
Free	$1.00
Starter	$3.00
Pro	$10.00
Team	$25.00
Enterprise	$50.00

HITL response (HTTP 202):


HTTP/1.1 202 Accepted
X-CM-Approval-ID: apr_abc123def456
Retry-After: 30


{
  "status": "pending_approval",
  "approval_id": "apr_abc123def456",
  "retry_after_seconds": 30,
  "message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00",
  "estimated_cost": 12.50,
  "model": "gpt-4o"
}

Polling for approval status:


GET /v1/approvals/apr_abc123def456/status


{
  "approval_id": "apr_abc123def456",
  "status": "approved",
  "approved_by": "user@example.com",
  "approved_at": "2026-02-27T12:05:00Z"
}

Possible status values: pending, approved, rejected, expired.

Dashboard integration:

Pending approvals appear in the dashboard under Gateway > Approval Queues. Reviewers can see the model, estimated cost, and request content before approving or rejecting.

Admin endpoints for managing approvals:

Method	Endpoint	Description
`GET`	`/gateway/admin/approvals`	List pending approvals (paginated)
`GET`	`/gateway/admin/approvals/stats`	Counts of pending/approved/rejected
`GET`	`/gateway/admin/approvals/{id}`	Get full approval details
`POST`	`/gateway/admin/approvals/{id}/approve`	Approve a request
`POST`	`/gateway/admin/approvals/{id}/reject`	Reject a request with reason

Short-circuit behavior

The governance chain fails fast. Each check is evaluated in order, and the first denial terminates the chain:

If rate limiting blocks a request, cost estimation is never performed
If cost estimation blocks, PII scanning is skipped
Metadata from earlier steps (rate limit counters, daily cost) is carried forward into the denial response so clients always receive complete headers

This ordering is intentional:

Plan enforcement first — ensures subscription is active
Rate limiting — cheapest check (single Redis increment), prevents abuse
Cost estimation — requires token counting but avoids scanning content
PII scanning — requires text extraction and pattern matching
Content safety — optional DLP scanning
Model allowlist — simple string comparison
HITL gate last — only reached for requests that pass all automated checks

Custom governance policies

Governance policies can be customized per organization via the dashboard or admin API.

Create or update a policy


POST /gateway/admin/policies


{
  "org_id": "org_abc123",
  "rpm_limit": 500,
  "daily_budget": 200.0,
  "monthly_budget": 5000.0,
  "max_cost_per_request": 3.0,
  "hitl_cost_threshold": 15.0,
  "allowed_models": ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"]
}

Policy presets

The gateway includes built-in presets that can be applied in one step:


POST /gateway/admin/policies/apply-preset


{
  "preset": "team"
}

Available presets: free, starter, pro, team, enterprise.

Retrieving policies


GET /gateway/admin/policies

Returns the effective governance policy for the authenticated organization, including any custom overrides and the tier-based defaults.