Governance Chain

The governance chain is the policy pipeline that evaluates every gateway request before it reaches the upstream LLM provider. Checks run in order and the chain short-circuits on the first denial — if rate limiting blocks a request, cost estimation never runs.

Overview


Request
  --> [0. Plan Enforcement]
  --> [1. Rate Limit]
  --> [1.5 Plan Entitlement]
  --> [2. Cost Estimate]
  --> [2.5 Hierarchical Budget]
  --> [3. Runner Session Budget]
  --> [4. PII Scan]
  --> [4.5 Content Safety]
  --> [5. Model Allowlist]
  --> [6. HITL Gate]
  --> Proxy to Provider

Each step returns one of three actions:

Action	Meaning
`ALLOW`	Request passes this check; proceed to the next step
`BLOCK`	Request is denied immediately with an error response
`NEEDS_APPROVAL`	Request is held for human review (HITL gate only)

Error response format

All governance denials use the OpenAI-compatible error format:


{
  "error": {
    "message": "Human-readable description of what was blocked and why",
    "type": "permission_error",
    "param": null,
    "code": "daily_budget",
    "gateway_error_code": "GW_COST_002",
    "remediation": "Increase your daily budget at dashboard.curate-me.ai/policies"
  }
}

The gateway_error_code field provides a stable, machine-readable identifier. The remediation field tells developers exactly how to fix the issue.

Response headers

The gateway attaches governance metadata to every response, whether the request was allowed or denied:

Header	Description	Example
`X-RateLimit-Limit`	Maximum requests per minute	`60`
`X-RateLimit-Remaining`	Requests remaining in current window	`42`
`X-RateLimit-Reset`	Unix timestamp when the window resets	`1708642860`
`X-CM-Cost`	Estimated cost for this request (USD)	`0.0034`
`X-CM-Daily-Cost`	Cumulative daily spend (USD)	`12.50`
`X-CM-Daily-Budget`	Daily budget limit (USD)	`25.00`
`X-CM-Governance-Time-Ms`	Time spent in governance checks	`3.2`
`X-CM-Request-ID`	Unique request identifier	`gw_a1b2c3d4`
`X-CM-Approval-ID`	Approval ID (when HITL triggers)	`apr_abc123`

Step 0: Plan enforcement

What it checks: Whether the org’s subscription is active, whether the daily request quota for the plan tier has been reached, and whether the requested model is available on that tier.

When it blocks: Expired subscription, daily quota exceeded, or model not available on the plan.

HTTP status: 429 (quota) or 403 (model access)

Error codes: GW_PLAN_001 (inactive), GW_PLAN_002 (quota), GW_PLAN_003 (model)

How to configure: Plan enforcement is automatic based on your subscription tier. Upgrade your plan at dashboard.curate-me.ai/billing .

This step is separate from org-level governance policy. Plan enforcement controls whether an org can use a class of service at all. Governance policy controls what custom limits apply.

Step 1: Rate limiting

What it checks: Whether the org has exceeded its requests-per-minute (RPM) limit.

When it blocks: The per-org, per-minute counter in Redis exceeds rpm_limit.

HTTP status: 429 Too Many Requests

Error code: GW_RATE_001

Response headers on denial:


HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708642860
Retry-After: 18

Error body:


{
  "error": {
    "message": "Rate limit exceeded: 61 requests in current minute exceeds limit of 60 RPM",
    "type": "rate_limit_error",
    "code": "rate_limit",
    "gateway_error_code": "GW_RATE_001",
    "remediation": "Wait for the current rate limit window to reset, or increase your RPM limit in gateway policies."
  }
}

How to configure:


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"rpm_limit": 120}'

Default limits by tier:

Tier	RPM Limit
Free	10
Starter	60
Growth	300
Enterprise	5,000

Webhook: Fires rate_limit.hit when a request is blocked.

Step 1.5: Plan entitlement

What it checks: Legacy daily request count and daily budget caps from billing_prices tier definitions.

When it blocks: Daily request count or daily budget exceeded for the plan tier.

HTTP status: 429

Error code: GW_ENTITLEMENT_001

This step exists for backward compatibility with the billing tier system. For most configurations, the cost check in step 2 is the primary budget enforcement.

Step 2: Cost estimate

What it checks: The estimated cost of the request against three limits:

Per-request cost ceiling (max_cost_per_request) — blocks individual expensive requests
Daily budget (daily_budget) — blocks when cumulative daily spend plus the estimated cost would exceed the limit
Monthly budget (monthly_budget) — blocks when cumulative monthly spend plus the estimated cost would exceed the limit

How cost is estimated: The gateway uses tiktoken BPE encoding to count input tokens from the request body (messages, system prompt, tools/functions), then multiplies by the model’s per-token pricing from the built-in pricing table.

When it blocks: Any of the three limits would be exceeded.

HTTP status: 403

Error codes:

Code	Meaning
`GW_COST_001`	Per-request cost exceeds `max_cost_per_request`
`GW_COST_002`	Daily budget would be exceeded
`GW_COST_003`	Monthly budget would be exceeded

Error body example:


{
  "error": {
    "message": "Daily budget exhausted: $24.50 spent + $0.85 estimated > $25.00 limit",
    "type": "permission_error",
    "code": "daily_budget",
    "gateway_error_code": "GW_COST_002",
    "remediation": "Increase your daily budget at dashboard.curate-me.ai/policies or wait until midnight UTC for the counter to reset."
  }
}

How to configure:


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "max_cost_per_request": 2.00,
    "daily_budget": 100.00,
    "monthly_budget": 2000.00
  }'

Default limits by tier:

Tier	Max/Request	Daily Budget	Monthly Budget
Free	$0.25	$5	$50
Starter	$0.50	$25	$250
Growth	$2.00	$100	$2,000
Enterprise	$10.00	$2,000	$50,000

Webhooks: Fires budget.warning at 80% of daily budget, budget.exceeded when a request is blocked.

Step 2.5: Hierarchical budget

What it checks: Budget limits at three levels: Organization, Team, and API Key.

When it blocks: Any level in the hierarchy has exceeded its budget.

HTTP status: 403

Error code: GW_BUDGET_HIERARCHY_001

This step enables fine-grained cost control. You can set a $100/day budget for the org, then allocate $30/day to the engineering team and $20/day to the marketing team. Individual API keys can have their own daily and monthly spend caps.

Per-key spend caps are set when creating or updating an API key:


curl -X POST https://api.curate-me.ai/gateway/admin/keys \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "staging-key",
    "scopes": ["proxy"],
    "daily_spend_cap_usd": 10.00,
    "monthly_spend_cap_usd": 200.00
  }'

Step 3: Runner session budget

What it checks: For requests originating from a managed runner session, this step checks the per-session cost limit.

When it blocks: The cumulative cost of the runner session exceeds the configured session budget.

HTTP status: 403

Error code: GW_SESSION_BUDGET_001

This step only applies when the request includes a runner_id and session_id (automatically set for requests from managed runner containers). It prevents long-running agent sessions from silently burning through budget.

Step 4: PII scan

What it checks: Scans all user-provided text in the request (messages, system prompt) for personally identifiable information and secrets before the request leaves your infrastructure.

Detection methods:

Regex patterns — API keys (OpenAI, Anthropic, Google, GitHub, Stripe, AWS), bearer tokens, SSNs, credit cards (Luhn-validated), phone numbers, email addresses, IBANs, EU passport numbers, and more
Presidio NER (when available) — 50+ entity types using spaCy NLP models for context-aware detection

Severity classification:

Severity	Types
CRITICAL	SSN, credit card, IBAN, EU passport, German ID, UK NINO
HIGH	API keys, bearer tokens, AWS credentials, secret patterns
MEDIUM	Email, phone number, EU VAT, ICD-10 codes

When it blocks: PII is detected and the policy’s pii_action is set to block (the default).

HTTP status: 403

Error code: GW_PII_001

Error body example:


{
  "error": {
    "message": "PII detected in request: api_key_openai, email. Blocked by governance policy.",
    "type": "permission_error",
    "code": "pii_detected",
    "gateway_error_code": "GW_PII_001",
    "pii_types": ["api_key_openai", "email"],
    "remediation": "Remove PII from request content, or adjust pii_action in your governance policy."
  }
}

How to configure:


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "pii_scan_enabled": true,
    "pii_action": "block",
    "pii_entity_config": {
      "PERSON": true,
      "EMAIL_ADDRESS": true,
      "PHONE_NUMBER": true,
      "CREDIT_CARD": true,
      "US_SSN": true,
      "IP_ADDRESS": false,
      "LOCATION": false,
      "ORGANIZATION": false
    }
  }'

Set pii_action to "needs_approval" to route PII-containing requests to the HITL approval queue instead of blocking outright.

Webhook: Fires guardrail.triggered with guardrail: "pii_scan".

Step 4.5: Content safety

What it checks: Prompt injection patterns, jailbreak attempts, and data exfiltration signals in the request content.

When it blocks: A content safety violation is detected and the guardrail is enabled.

HTTP status: 403

Error code: GW_SAFETY_001

Content safety is a separate layer from PII scanning. PII scanning protects against accidental data leakage. Content safety protects against deliberate adversarial attacks.

Webhook: Fires guardrail.triggered with guardrail: "content_safety".

Step 5: Model allowlist

What it checks: Whether the requested model is in the org’s allowed models list. The check runs after model alias resolution, so aliases are resolved before comparison.

When it blocks: The org has a non-empty allowed_models list and the requested model is not in it.

HTTP status: 403

Error code: GW_MODEL_001

Error body example:


{
  "error": {
    "message": "Model 'gpt-5.1' is not in your allowed models list",
    "type": "permission_error",
    "code": "model_not_allowed",
    "gateway_error_code": "GW_MODEL_001",
    "hint": "Add the model to your allowlist or use one of your allowed models.",
    "hint_data": {
      "allowed_models": ["gpt-4o-mini", "claude-haiku-3-5-20241022", "gemini-2.5-flash"],
      "requested_model": "gpt-5.1"
    }
  }
}

How to configure:


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "allowed_models": [
      "gpt-4o-mini",
      "claude-haiku-3-5-20241022",
      "gemini-2.5-flash"
    ]
  }'

An empty allowed_models list (the default) allows all models. You can further narrow the allowlist at the API key level using key_allowed_models.

Step 6: HITL gate (Human-in-the-Loop)

What it checks: Whether the estimated cost exceeds the configured HITL threshold.

When it triggers: Estimated cost is above hitl_cost_threshold. The request is not denied — it is held for human approval.

HTTP status: 202 Accepted

Response headers:


HTTP/1.1 202 Accepted
X-CM-Approval-ID: apr_abc123def456
Retry-After: 30

Response body:


{
  "status": "pending_approval",
  "approval_id": "apr_abc123def456",
  "message": "Request requires human approval: Estimated cost $12.50 exceeds HITL threshold $10.00",
  "retry_after_seconds": 30,
  "estimated_cost": 12.50
}

How to handle in code:


import time
import requests
 
response = requests.post(
    "https://api.curate-me.ai/v1/chat/completions",
    headers={
        "X-CM-API-Key": "cm_sk_your_key",
        "Authorization": "Bearer sk-your-openai-key",
    },
    json={"model": "gpt-5.1", "messages": [{"role": "user", "content": "..."}]},
)
 
if response.status_code == 202:
    approval_id = response.json()["approval_id"]
    # Poll for approval (or use the SDK's wait_for_approval method)
    while True:
        status = requests.get(
            f"https://api.curate-me.ai/gateway/admin/approvals/{approval_id}",
            headers={"X-CM-API-Key": "cm_sk_your_key"},
        ).json()
        if status["status"] in ("approved", "rejected", "expired"):
            break
        time.sleep(status.get("retry_after_seconds", 5))

Or with the SDK:


admin = gw.admin()
result = await admin.wait_for_approval("apr_abc123def456", timeout=300)

Pending approvals appear in the dashboard approval queue at Gateway > Approvals.

How to configure:


curl -X PUT https://api.curate-me.ai/gateway/admin/policies/org_abc123 \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"hitl_cost_threshold": 10.00}'

Default thresholds by tier:

Tier	HITL Threshold
Free	$1.00
Starter	$3.00
Growth	$10.00
Enterprise	$50.00

Webhook: Fires hitl.pending when a request enters the approval queue.

Policy format

The full governance policy is a JSON document stored per organization:


{
  "org_id": "org_abc123",
  "rpm_limit": 60,
  "max_cost_per_request": 2.00,
  "daily_budget": 100.00,
  "monthly_budget": 2000.00,
  "pii_scan_enabled": true,
  "pii_action": "block",
  "pii_entity_config": {
    "PERSON": true,
    "EMAIL_ADDRESS": true,
    "PHONE_NUMBER": true,
    "CREDIT_CARD": true,
    "US_SSN": true,
    "IP_ADDRESS": false,
    "LOCATION": false
  },
  "allowed_models": [],
  "hitl_cost_threshold": 10.00,
  "request_logging_mode": "full",
  "failover_policy": {
    "enabled": false,
    "mode": "cost_aware",
    "max_cost_multiplier": 1.5,
    "max_failover_attempts": 2
  }
}

Simulating policy changes

Before applying a policy change, you can simulate it against recent traffic to see what would have been blocked:


curl -X POST https://api.curate-me.ai/gateway/admin/policies/simulate \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "draft_policy": {
      "rpm_limit": 30,
      "daily_budget": 10.00
    },
    "replay_hours": 24
  }'

The response shows how many requests would have been blocked and by which step.

Webhook events

The governance chain fires webhook events for key governance actions:

Event	When
`rate_limit.hit`	A request is blocked by rate limiting
`budget.warning`	Daily spend reaches 80% of the budget
`budget.exceeded`	A request is blocked by budget enforcement
`guardrail.triggered`	PII scan or content safety blocks a request
`hitl.pending`	A request enters the HITL approval queue

Configure webhooks at Dashboard > Gateway > Webhooks or via the API:


curl -X POST https://api.curate-me.ai/api/v1/admin/webhooks \
  -H "X-CM-API-Key: cm_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.com/webhooks/curate-me",
    "events": ["budget.exceeded", "budget.warning", "rate_limit.hit"]
  }'

Next steps

Cost Tracking — how costs are recorded and attributed
Gateway API Reference — full endpoint documentation
Runbook: Budget Exceeded — diagnosing cost spikes
Runbook: Rate Limit Hit — resolving rate limit issues