Skip to Content
GuidesGovernance Chain

Governance Chain

The governance chain is the policy pipeline that evaluates every gateway request before it reaches the upstream LLM provider. Checks run in order and the chain short-circuits on the first denial — if rate limiting blocks a request, cost estimation never runs.

Overview

Request --> [0. Plan Enforcement] --> [1. Rate Limit] --> [1.5 Plan Entitlement] --> [2. Cost Estimate] --> [2.5 Hierarchical Budget] --> [3. Runner Session Budget] --> [4. PII Scan] --> [4.5 Content Safety] --> [5. Model Allowlist] --> [6. HITL Gate] --> Proxy to Provider

Each step returns one of three actions:

ActionMeaning
ALLOWRequest passes this check; proceed to the next step
BLOCKRequest is denied immediately with an error response
NEEDS_APPROVALRequest is held for human review (HITL gate only)

Error response format

All governance denials use the OpenAI-compatible error format:

{ "error": { "message": "Human-readable description of what was blocked and why", "type": "permission_error", "param": null, "code": "daily_budget", "gateway_error_code": "GW_COST_002", "remediation": "Increase your daily budget at dashboard.curate-me.ai/policies" } }

The gateway_error_code field provides a stable, machine-readable identifier. The remediation field tells developers exactly how to fix the issue.

Response headers

The gateway attaches governance metadata to every response, whether the request was allowed or denied:

HeaderDescriptionExample
X-RateLimit-LimitMaximum requests per minute60
X-RateLimit-RemainingRequests remaining in current window42
X-RateLimit-ResetUnix timestamp when the window resets1708642860
X-CM-CostEstimated cost for this request (USD)0.0034
X-CM-Daily-CostCumulative daily spend (USD)12.50
X-CM-Daily-BudgetDaily budget limit (USD)25.00
X-CM-Governance-Time-MsTime spent in governance checks3.2
X-CM-Request-IDUnique request identifiergw_a1b2c3d4
X-CM-Approval-IDApproval ID (when HITL triggers)apr_abc123

Step 0: Plan enforcement

What it checks: Whether the org’s subscription is active, whether the daily request quota for the plan tier has been reached, and whether the requested model is available on that tier.

When it blocks: Expired subscription, daily quota exceeded, or model not available on the plan.

HTTP status: 429 (quota) or 403 (model access)

Error codes: GW_PLAN_001 (inactive), GW_PLAN_002 (quota), GW_PLAN_003 (model)

How to configure: Plan enforcement is automatic based on your subscription tier. Upgrade your plan at dashboard.curate-me.ai/billing .

This step is separate from org-level governance policy. Plan enforcement controls whether an org can use a class of service at all. Governance policy controls what custom limits apply.


Step 1: Rate limiting

What it checks: Whether the org has exceeded its requests-per-minute (RPM) limit.

When it blocks: The per-org, per-minute counter in Redis exceeds rpm_limit.

HTTP status: 429 Too Many Requests

Error code: GW_RATE_001

Response headers on denial:

HTTP/1.1 429 Too Many Requests X-RateLimit-Limit: 60 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1708642860 Retry-After: 18

Error body:

{ "error": { "message": "Rate limit exceeded: 61 requests in current minute exceeds limit of 60 RPM", "type": "rate_limit_error", "code": "rate_limit", "gateway_error_code": "GW_RATE_001", "remediation": "Wait for the current rate limit window to reset, or increase your RPM limit in gateway policies." } }

How to configure:

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{"rpm_limit": 120}'

Default limits by tier:

TierRPM Limit
Free10
Starter60
Growth300
Enterprise5,000

Webhook: Fires rate_limit.hit when a request is blocked.


Step 1.5: Plan entitlement

What it checks: Legacy daily request count and daily budget caps from billing_prices tier definitions.

When it blocks: Daily request count or daily budget exceeded for the plan tier.

HTTP status: 429

Error code: GW_ENTITLEMENT_001

This step exists for backward compatibility with the billing tier system. For most configurations, the cost check in step 2 is the primary budget enforcement.


Step 2: Cost estimate

What it checks: The estimated cost of the request against three limits:

  1. Per-request cost ceiling (max_cost_per_request) — blocks individual expensive requests
  2. Daily budget (daily_budget) — blocks when cumulative daily spend plus the estimated cost would exceed the limit
  3. Monthly budget (monthly_budget) — blocks when cumulative monthly spend plus the estimated cost would exceed the limit

How cost is estimated: The gateway uses tiktoken BPE encoding to count input tokens from the request body (messages, system prompt, tools/functions), then multiplies by the model’s per-token pricing from the built-in pricing table.

When it blocks: Any of the three limits would be exceeded.

HTTP status: 403

Error codes:

CodeMeaning
GW_COST_001Per-request cost exceeds max_cost_per_request
GW_COST_002Daily budget would be exceeded
GW_COST_003Monthly budget would be exceeded

Error body example:

{ "error": { "message": "Daily budget exhausted: $24.50 spent + $0.85 estimated > $25.00 limit", "type": "permission_error", "code": "daily_budget", "gateway_error_code": "GW_COST_002", "remediation": "Increase your daily budget at dashboard.curate-me.ai/policies or wait until midnight UTC for the counter to reset." } }

How to configure:

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "max_cost_per_request": 2.00, "daily_budget": 100.00, "monthly_budget": 2000.00 }'

Default limits by tier:

TierMax/RequestDaily BudgetMonthly Budget
Free$0.25$5$50
Starter$0.50$25$250
Growth$2.00$100$2,000
Enterprise$10.00$2,000$50,000

Webhooks: Fires budget.warning at 80% of daily budget, budget.exceeded when a request is blocked.


Step 2.5: Hierarchical budget

What it checks: Budget limits at three levels: Organization, Team, and API Key.

When it blocks: Any level in the hierarchy has exceeded its budget.

HTTP status: 403

Error code: GW_BUDGET_HIERARCHY_001

This step enables fine-grained cost control. You can set a $100/day budget for the org, then allocate $30/day to the engineering team and $20/day to the marketing team. Individual API keys can have their own daily and monthly spend caps.

Per-key spend caps are set when creating or updating an API key:

curl -X POST https://api.curate-me.ai/gateway/admin/keys \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "name": "staging-key", "scopes": ["proxy"], "daily_spend_cap_usd": 10.00, "monthly_spend_cap_usd": 200.00 }'

Step 3: Runner session budget

What it checks: For requests originating from a managed runner session, this step checks the per-session cost limit.

When it blocks: The cumulative cost of the runner session exceeds the configured session budget.

HTTP status: 403

Error code: GW_SESSION_BUDGET_001

This step only applies when the request includes a runner_id and session_id (automatically set for requests from managed runner containers). It prevents long-running agent sessions from silently burning through budget.


Step 4: PII scan

What it checks: Scans all user-provided text in the request (messages, system prompt) for personally identifiable information and secrets before the request leaves your infrastructure.

Detection methods:

  1. Regex patterns — API keys (OpenAI, Anthropic, Google, GitHub, Stripe, AWS), bearer tokens, SSNs, credit cards (Luhn-validated), phone numbers, email addresses, IBANs, EU passport numbers, and more
  2. Presidio NER (when available) — 50+ entity types using spaCy NLP models for context-aware detection

Severity classification:

SeverityTypes
CRITICALSSN, credit card, IBAN, EU passport, German ID, UK NINO
HIGHAPI keys, bearer tokens, AWS credentials, secret patterns
MEDIUMEmail, phone number, EU VAT, ICD-10 codes

When it blocks: PII is detected and the policy’s pii_action is set to block (the default).

HTTP status: 403

Error code: GW_PII_001

Error body example:

{ "error": { "message": "PII detected in request: api_key_openai, email. Blocked by governance policy.", "type": "permission_error", "code": "pii_detected", "gateway_error_code": "GW_PII_001", "pii_types": ["api_key_openai", "email"], "remediation": "Remove PII from request content, or adjust pii_action in your governance policy." } }

How to configure:

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "pii_scan_enabled": true, "pii_action": "block", "pii_entity_config": { "PERSON": true, "EMAIL_ADDRESS": true, "PHONE_NUMBER": true, "CREDIT_CARD": true, "US_SSN": true, "IP_ADDRESS": false, "LOCATION": false, "ORGANIZATION": false } }'

Set pii_action to "needs_approval" to route PII-containing requests to the HITL approval queue instead of blocking outright.

Webhook: Fires guardrail.triggered with guardrail: "pii_scan".


Step 4.5: Content safety

What it checks: Prompt injection patterns, jailbreak attempts, and data exfiltration signals in the request content.

When it blocks: A content safety violation is detected and the guardrail is enabled.

HTTP status: 403

Error code: GW_SAFETY_001

Content safety is a separate layer from PII scanning. PII scanning protects against accidental data leakage. Content safety protects against deliberate adversarial attacks.

Webhook: Fires guardrail.triggered with guardrail: "content_safety".


Step 5: Model allowlist

What it checks: Whether the requested model is in the org’s allowed models list. The check runs after model alias resolution, so aliases are resolved before comparison.

When it blocks: The org has a non-empty allowed_models list and the requested model is not in it.

HTTP status: 403

Error code: GW_MODEL_001

Error body example:

{ "error": { "message": "Model 'gpt-5.1' is not in your allowed models list", "type": "permission_error", "code": "model_not_allowed", "gateway_error_code": "GW_MODEL_001", "hint": "Add the model to your allowlist or use one of your allowed models.", "hint_data": { "allowed_models": ["gpt-4o-mini", "claude-haiku-3-5-20241022", "gemini-2.5-flash"], "requested_model": "gpt-5.1" } } }

How to configure:

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "allowed_models": [ "gpt-4o-mini", "claude-haiku-3-5-20241022", "gemini-2.5-flash" ] }'

An empty allowed_models list (the default) allows all models. You can further narrow the allowlist at the API key level using key_allowed_models.


Step 6: HITL gate (Human-in-the-Loop)

What it checks: Whether the estimated cost exceeds the configured HITL threshold.

When it triggers: Estimated cost is above hitl_cost_threshold. The request is not denied — it is held for human approval.

HTTP status: 202 Accepted

Response headers:

HTTP/1.1 202 Accepted X-CM-Approval-ID: apr_abc123def456 Retry-After: 30

Response body:

{ "status": "pending_approval", "approval_id": "apr_abc123def456", "message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00", "retry_after_seconds": 30, "estimated_cost": 12.50 }

How to handle in code:

import time import requests response = requests.post( "https://api.curate-me.ai/v1/chat/completions", headers={ "X-CM-API-Key": "cm_sk_your_key", "Authorization": "Bearer sk-your-openai-key", }, json={"model": "gpt-5.1", "messages": [{"role": "user", "content": "..."}]}, ) if response.status_code == 202: approval_id = response.json()["approval_id"] # Poll for approval (or use the SDK's wait_for_approval method) while True: status = requests.get( f"https://api.curate-me.ai/gateway/admin/approvals/{approval_id}", headers={"X-CM-API-Key": "cm_sk_your_key"}, ).json() if status["status"] in ("approved", "rejected", "expired"): break time.sleep(status.get("retry_after_seconds", 5))

Or with the SDK:

admin = gw.admin() result = await admin.wait_for_approval("apr_abc123def456", timeout=300)

Pending approvals appear in the dashboard approval queue at Gateway > Approvals.

How to configure:

curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{"hitl_cost_threshold": 10.00}'

Default thresholds by tier:

TierHITL Threshold
Free$1.00
Starter$3.00
Growth$10.00
Enterprise$50.00

Webhook: Fires hitl.pending when a request enters the approval queue.


Policy format

The full governance policy is a JSON document stored per organization:

{ "org_id": "org_abc123", "rpm_limit": 60, "max_cost_per_request": 2.00, "daily_budget": 100.00, "monthly_budget": 2000.00, "pii_scan_enabled": true, "pii_action": "block", "pii_entity_config": { "PERSON": true, "EMAIL_ADDRESS": true, "PHONE_NUMBER": true, "CREDIT_CARD": true, "US_SSN": true, "IP_ADDRESS": false, "LOCATION": false }, "allowed_models": [], "hitl_cost_threshold": 10.00, "request_logging_mode": "full", "failover_policy": { "enabled": false, "mode": "cost_aware", "max_cost_multiplier": 1.5, "max_failover_attempts": 2 } }

Simulating policy changes

Before applying a policy change, you can simulate it against recent traffic to see what would have been blocked:

curl -X POST https://api.curate-me.ai/gateway/admin/policies/simulate \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "draft_policy": { "rpm_limit": 30, "daily_budget": 10.00 }, "replay_hours": 24 }'

The response shows how many requests would have been blocked and by which step.

Webhook events

The governance chain fires webhook events for key governance actions:

EventWhen
rate_limit.hitA request is blocked by rate limiting
budget.warningDaily spend reaches 80% of the budget
budget.exceededA request is blocked by budget enforcement
guardrail.triggeredPII scan or content safety blocks a request
hitl.pendingA request enters the HITL approval queue

Configure webhooks at Dashboard > Gateway > Webhooks or via the API:

curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \ -H "X-CM-API-Key: cm_sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-app.com/webhooks/curate-me", "events": ["budget.exceeded", "budget.warning", "rate_limit.hit"] }'

Next steps