Governance Chain
The governance chain is the policy pipeline that evaluates every gateway request before it reaches the upstream LLM provider. Checks run in order and the chain short-circuits on the first denial — if rate limiting blocks a request, cost estimation never runs.
Overview
Request
--> [0. Plan Enforcement]
--> [1. Rate Limit]
--> [1.5 Plan Entitlement]
--> [2. Cost Estimate]
--> [2.5 Hierarchical Budget]
--> [3. Runner Session Budget]
--> [4. PII Scan]
--> [4.5 Content Safety]
--> [5. Model Allowlist]
--> [6. HITL Gate]
--> Proxy to ProviderEach step returns one of three actions:
| Action | Meaning |
|---|---|
ALLOW | Request passes this check; proceed to the next step |
BLOCK | Request is denied immediately with an error response |
NEEDS_APPROVAL | Request is held for human review (HITL gate only) |
Error response format
All governance denials use the OpenAI-compatible error format:
{
"error": {
"message": "Human-readable description of what was blocked and why",
"type": "permission_error",
"param": null,
"code": "daily_budget",
"gateway_error_code": "GW_COST_002",
"remediation": "Increase your daily budget at dashboard.curate-me.ai/policies"
}
}The gateway_error_code field provides a stable, machine-readable identifier. The
remediation field tells developers exactly how to fix the issue.
Response headers
The gateway attaches governance metadata to every response, whether the request was allowed or denied:
| Header | Description | Example |
|---|---|---|
X-RateLimit-Limit | Maximum requests per minute | 60 |
X-RateLimit-Remaining | Requests remaining in current window | 42 |
X-RateLimit-Reset | Unix timestamp when the window resets | 1708642860 |
X-CM-Cost | Estimated cost for this request (USD) | 0.0034 |
X-CM-Daily-Cost | Cumulative daily spend (USD) | 12.50 |
X-CM-Daily-Budget | Daily budget limit (USD) | 25.00 |
X-CM-Governance-Time-Ms | Time spent in governance checks | 3.2 |
X-CM-Request-ID | Unique request identifier | gw_a1b2c3d4 |
X-CM-Approval-ID | Approval ID (when HITL triggers) | apr_abc123 |
Step 0: Plan enforcement
What it checks: Whether the org’s subscription is active, whether the daily request quota for the plan tier has been reached, and whether the requested model is available on that tier.
When it blocks: Expired subscription, daily quota exceeded, or model not available on the plan.
HTTP status: 429 (quota) or 403 (model access)
Error codes: GW_PLAN_001 (inactive), GW_PLAN_002 (quota), GW_PLAN_003 (model)
How to configure: Plan enforcement is automatic based on your subscription tier. Upgrade your plan at dashboard.curate-me.ai/billing .
This step is separate from org-level governance policy. Plan enforcement controls whether an org can use a class of service at all. Governance policy controls what custom limits apply.
Step 1: Rate limiting
What it checks: Whether the org has exceeded its requests-per-minute (RPM) limit.
When it blocks: The per-org, per-minute counter in Redis exceeds rpm_limit.
HTTP status: 429 Too Many Requests
Error code: GW_RATE_001
Response headers on denial:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708642860
Retry-After: 18Error body:
{
"error": {
"message": "Rate limit exceeded: 61 requests in current minute exceeds limit of 60 RPM",
"type": "rate_limit_error",
"code": "rate_limit",
"gateway_error_code": "GW_RATE_001",
"remediation": "Wait for the current rate limit window to reset, or increase your RPM limit in gateway policies."
}
}How to configure:
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"rpm_limit": 120}'Default limits by tier:
| Tier | RPM Limit |
|---|---|
| Free | 10 |
| Starter | 60 |
| Growth | 300 |
| Enterprise | 5,000 |
Webhook: Fires rate_limit.hit when a request is blocked.
Step 1.5: Plan entitlement
What it checks: Legacy daily request count and daily budget caps from billing_prices
tier definitions.
When it blocks: Daily request count or daily budget exceeded for the plan tier.
HTTP status: 429
Error code: GW_ENTITLEMENT_001
This step exists for backward compatibility with the billing tier system. For most configurations, the cost check in step 2 is the primary budget enforcement.
Step 2: Cost estimate
What it checks: The estimated cost of the request against three limits:
- Per-request cost ceiling (
max_cost_per_request) — blocks individual expensive requests - Daily budget (
daily_budget) — blocks when cumulative daily spend plus the estimated cost would exceed the limit - Monthly budget (
monthly_budget) — blocks when cumulative monthly spend plus the estimated cost would exceed the limit
How cost is estimated: The gateway uses tiktoken BPE encoding to count input tokens from the request body (messages, system prompt, tools/functions), then multiplies by the model’s per-token pricing from the built-in pricing table.
When it blocks: Any of the three limits would be exceeded.
HTTP status: 403
Error codes:
| Code | Meaning |
|---|---|
GW_COST_001 | Per-request cost exceeds max_cost_per_request |
GW_COST_002 | Daily budget would be exceeded |
GW_COST_003 | Monthly budget would be exceeded |
Error body example:
{
"error": {
"message": "Daily budget exhausted: $24.50 spent + $0.85 estimated > $25.00 limit",
"type": "permission_error",
"code": "daily_budget",
"gateway_error_code": "GW_COST_002",
"remediation": "Increase your daily budget at dashboard.curate-me.ai/policies or wait until midnight UTC for the counter to reset."
}
}How to configure:
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"max_cost_per_request": 2.00,
"daily_budget": 100.00,
"monthly_budget": 2000.00
}'Default limits by tier:
| Tier | Max/Request | Daily Budget | Monthly Budget |
|---|---|---|---|
| Free | $0.25 | $5 | $50 |
| Starter | $0.50 | $25 | $250 |
| Growth | $2.00 | $100 | $2,000 |
| Enterprise | $10.00 | $2,000 | $50,000 |
Webhooks: Fires budget.warning at 80% of daily budget, budget.exceeded when a request is blocked.
Step 2.5: Hierarchical budget
What it checks: Budget limits at three levels: Organization, Team, and API Key.
When it blocks: Any level in the hierarchy has exceeded its budget.
HTTP status: 403
Error code: GW_BUDGET_HIERARCHY_001
This step enables fine-grained cost control. You can set a $100/day budget for the org, then allocate $30/day to the engineering team and $20/day to the marketing team. Individual API keys can have their own daily and monthly spend caps.
Per-key spend caps are set when creating or updating an API key:
curl -X POST https://api.curate-me.ai/gateway/admin/keys \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"name": "staging-key",
"scopes": ["proxy"],
"daily_spend_cap_usd": 10.00,
"monthly_spend_cap_usd": 200.00
}'Step 3: Runner session budget
What it checks: For requests originating from a managed runner session, this step checks the per-session cost limit.
When it blocks: The cumulative cost of the runner session exceeds the configured session budget.
HTTP status: 403
Error code: GW_SESSION_BUDGET_001
This step only applies when the request includes a runner_id and session_id
(automatically set for requests from managed runner containers). It prevents long-running
agent sessions from silently burning through budget.
Step 4: PII scan
What it checks: Scans all user-provided text in the request (messages, system prompt) for personally identifiable information and secrets before the request leaves your infrastructure.
Detection methods:
- Regex patterns — API keys (OpenAI, Anthropic, Google, GitHub, Stripe, AWS), bearer tokens, SSNs, credit cards (Luhn-validated), phone numbers, email addresses, IBANs, EU passport numbers, and more
- Presidio NER (when available) — 50+ entity types using spaCy NLP models for context-aware detection
Severity classification:
| Severity | Types |
|---|---|
| CRITICAL | SSN, credit card, IBAN, EU passport, German ID, UK NINO |
| HIGH | API keys, bearer tokens, AWS credentials, secret patterns |
| MEDIUM | Email, phone number, EU VAT, ICD-10 codes |
When it blocks: PII is detected and the policy’s pii_action is set to block
(the default).
HTTP status: 403
Error code: GW_PII_001
Error body example:
{
"error": {
"message": "PII detected in request: api_key_openai, email. Blocked by governance policy.",
"type": "permission_error",
"code": "pii_detected",
"gateway_error_code": "GW_PII_001",
"pii_types": ["api_key_openai", "email"],
"remediation": "Remove PII from request content, or adjust pii_action in your governance policy."
}
}How to configure:
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"pii_scan_enabled": true,
"pii_action": "block",
"pii_entity_config": {
"PERSON": true,
"EMAIL_ADDRESS": true,
"PHONE_NUMBER": true,
"CREDIT_CARD": true,
"US_SSN": true,
"IP_ADDRESS": false,
"LOCATION": false,
"ORGANIZATION": false
}
}'Set pii_action to "needs_approval" to route PII-containing requests to the HITL
approval queue instead of blocking outright.
Webhook: Fires guardrail.triggered with guardrail: "pii_scan".
Step 4.5: Content safety
What it checks: Prompt injection patterns, jailbreak attempts, and data exfiltration signals in the request content.
When it blocks: A content safety violation is detected and the guardrail is enabled.
HTTP status: 403
Error code: GW_SAFETY_001
Content safety is a separate layer from PII scanning. PII scanning protects against accidental data leakage. Content safety protects against deliberate adversarial attacks.
Webhook: Fires guardrail.triggered with guardrail: "content_safety".
Step 5: Model allowlist
What it checks: Whether the requested model is in the org’s allowed models list. The check runs after model alias resolution, so aliases are resolved before comparison.
When it blocks: The org has a non-empty allowed_models list and the requested model
is not in it.
HTTP status: 403
Error code: GW_MODEL_001
Error body example:
{
"error": {
"message": "Model 'gpt-5.1' is not in your allowed models list",
"type": "permission_error",
"code": "model_not_allowed",
"gateway_error_code": "GW_MODEL_001",
"hint": "Add the model to your allowlist or use one of your allowed models.",
"hint_data": {
"allowed_models": ["gpt-4o-mini", "claude-haiku-3-5-20241022", "gemini-2.5-flash"],
"requested_model": "gpt-5.1"
}
}
}How to configure:
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"allowed_models": [
"gpt-4o-mini",
"claude-haiku-3-5-20241022",
"gemini-2.5-flash"
]
}'An empty allowed_models list (the default) allows all models. You can further narrow
the allowlist at the API key level using key_allowed_models.
Step 6: HITL gate (Human-in-the-Loop)
What it checks: Whether the estimated cost exceeds the configured HITL threshold.
When it triggers: Estimated cost is above hitl_cost_threshold. The request is not
denied — it is held for human approval.
HTTP status: 202 Accepted
Response headers:
HTTP/1.1 202 Accepted
X-CM-Approval-ID: apr_abc123def456
Retry-After: 30Response body:
{
"status": "pending_approval",
"approval_id": "apr_abc123def456",
"message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00",
"retry_after_seconds": 30,
"estimated_cost": 12.50
}How to handle in code:
import time
import requests
response = requests.post(
"https://api.curate-me.ai/v1/chat/completions",
headers={
"X-CM-API-Key": "cm_sk_your_key",
"Authorization": "Bearer sk-your-openai-key",
},
json={"model": "gpt-5.1", "messages": [{"role": "user", "content": "..."}]},
)
if response.status_code == 202:
approval_id = response.json()["approval_id"]
# Poll for approval (or use the SDK's wait_for_approval method)
while True:
status = requests.get(
f"https://api.curate-me.ai/gateway/admin/approvals/{approval_id}",
headers={"X-CM-API-Key": "cm_sk_your_key"},
).json()
if status["status"] in ("approved", "rejected", "expired"):
break
time.sleep(status.get("retry_after_seconds", 5))Or with the SDK:
admin = gw.admin()
result = await admin.wait_for_approval("apr_abc123def456", timeout=300)Pending approvals appear in the dashboard approval queue at Gateway > Approvals.
How to configure:
curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"hitl_cost_threshold": 10.00}'Default thresholds by tier:
| Tier | HITL Threshold |
|---|---|
| Free | $1.00 |
| Starter | $3.00 |
| Growth | $10.00 |
| Enterprise | $50.00 |
Webhook: Fires hitl.pending when a request enters the approval queue.
Policy format
The full governance policy is a JSON document stored per organization:
{
"org_id": "org_abc123",
"rpm_limit": 60,
"max_cost_per_request": 2.00,
"daily_budget": 100.00,
"monthly_budget": 2000.00,
"pii_scan_enabled": true,
"pii_action": "block",
"pii_entity_config": {
"PERSON": true,
"EMAIL_ADDRESS": true,
"PHONE_NUMBER": true,
"CREDIT_CARD": true,
"US_SSN": true,
"IP_ADDRESS": false,
"LOCATION": false
},
"allowed_models": [],
"hitl_cost_threshold": 10.00,
"request_logging_mode": "full",
"failover_policy": {
"enabled": false,
"mode": "cost_aware",
"max_cost_multiplier": 1.5,
"max_failover_attempts": 2
}
}Simulating policy changes
Before applying a policy change, you can simulate it against recent traffic to see what would have been blocked:
curl -X POST https://api.curate-me.ai/gateway/admin/policies/simulate \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"draft_policy": {
"rpm_limit": 30,
"daily_budget": 10.00
},
"replay_hours": 24
}'The response shows how many requests would have been blocked and by which step.
Webhook events
The governance chain fires webhook events for key governance actions:
| Event | When |
|---|---|
rate_limit.hit | A request is blocked by rate limiting |
budget.warning | Daily spend reaches 80% of the budget |
budget.exceeded | A request is blocked by budget enforcement |
guardrail.triggered | PII scan or content safety blocks a request |
hitl.pending | A request enters the HITL approval queue |
Configure webhooks at Dashboard > Gateway > Webhooks or via the API:
curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \
-H "X-CM-API-Key: cm_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-app.com/webhooks/curate-me",
"events": ["budget.exceeded", "budget.warning", "rate_limit.hit"]
}'Next steps
- Cost Tracking — how costs are recorded and attributed
- Gateway API Reference — full endpoint documentation
- Runbook: Budget Exceeded — diagnosing cost spikes
- Runbook: Rate Limit Hit — resolving rate limit issues