Skip to Content
GatewayAPI ReferenceHealth and Management Endpoints

Health and Management Endpoints

The gateway provides health check, monitoring, and administrative endpoints for infrastructure management and observability.


Liveness probe

GET /health

Minimal liveness probe for container orchestration (Docker, Kubernetes). Returns 200 unconditionally as long as the gateway process is running. No authentication required.

Response

{ "status": "ok" }

Use this endpoint for Docker HEALTHCHECK directives and Kubernetes liveness probes.


Gateway health check

GET /v1/health

Standard health check that reports the gateway’s status along with Redis and MongoDB connectivity. No authentication required.

Response

HTTP/1.1 200 OK
{ "status": "ok", "service": "curate-me-gateway", "version": "0.1.0", "timestamp": "2026-02-27T12:00:00+00:00", "redis": "connected", "mongodb": "connected", "degraded": false }

Response fields

FieldTypeDescription
statusstringAlways "ok" if the process is running
servicestringService identifier ("curate-me-gateway")
versionstringGateway version
timestampstringISO 8601 timestamp (UTC)
redisstring"connected", "disconnected", or "not_configured"
mongodbstring"connected", "disconnected", or "not_configured"
degradedbooleantrue when a backing service is unreachable

Degraded state

The endpoint always returns 200 even when backing services are down — the degraded field indicates reduced functionality:

  • Redis disconnected: Rate limiting falls back to in-memory counters, cost tracking may be delayed
  • MongoDB disconnected: Audit logging and policy lookups may fail

This design ensures load balancers keep routing traffic to the gateway even when backing services have transient issues.


Provider health

GET /v1/providers/health

Returns the circuit breaker state for each upstream LLM provider. No authentication required. Designed for monitoring dashboards and uptime checks.

Response

{ "status": "healthy", "timestamp": "2026-02-27T12:00:00Z", "providers": [ { "provider": "anthropic", "state": "closed", "failure_count": 0, "failure_threshold": 5, "failure_window_seconds": 60.0, "recovery_timeout_seconds": 30.0, "last_failure_time": null, "last_success_time": 1709056790.123, "retry_after_seconds": null }, { "provider": "deepseek", "state": "closed", "failure_count": 1, "failure_threshold": 5, "failure_window_seconds": 60.0, "recovery_timeout_seconds": 30.0, "last_failure_time": 1709056750.456, "last_success_time": 1709056789.789, "retry_after_seconds": null }, { "provider": "google", "state": "closed", "failure_count": 0, "failure_threshold": 5, "failure_window_seconds": 60.0, "recovery_timeout_seconds": 30.0, "last_failure_time": null, "last_success_time": 1709056795.321, "retry_after_seconds": null }, { "provider": "openai", "state": "closed", "failure_count": 0, "failure_threshold": 5, "failure_window_seconds": 60.0, "recovery_timeout_seconds": 30.0, "last_failure_time": null, "last_success_time": 1709056799.654, "retry_after_seconds": null } ] }

Overall status

ValueCondition
healthyAll circuit breakers are closed
degradedSome circuit breakers are open or half-open
unhealthyAll circuit breakers are open

Circuit breaker states

StateDescription
closedProvider is healthy — requests are proxied normally
openProvider is failing — requests are rejected immediately with 503
half_openRecovery probe in progress — a single request is allowed through to test the provider

Circuit breaker fields

FieldTypeDescription
providerstringProvider name (openai, anthropic, google, deepseek)
statestringCircuit breaker state: closed, open, half_open
failure_countintegerFailures in the current sliding window
failure_thresholdintegerFailures required to trip the circuit (default: 5)
failure_window_secondsfloatSliding window duration in seconds (default: 60)
recovery_timeout_secondsfloatTime before a probe is allowed after circuit opens (default: 30)
last_failure_timefloat or nullMonotonic timestamp of last failure
last_success_timefloat or nullMonotonic timestamp of last success
retry_after_secondsfloat or nullSeconds until the circuit transitions to half-open (only when open)

Detailed health check

GET /v1/health/detailed

Comprehensive gateway health check with per-provider status, infrastructure components, current metrics, cost vs budget, active circuit breakers, rate limit status, and triggered alerts. No authentication required. Designed for monitoring dashboards.

Response

{ "overall_status": "healthy", "timestamp": "2026-02-27T12:00:00Z", "uptime_seconds": 86400.5, "components": { "gateway": "healthy", "redis": "healthy", "mongodb": "healthy" }, "providers": { "openai": {"status": "up", "circuit_breaker": "closed"}, "anthropic": {"status": "up", "circuit_breaker": "closed"}, "google": {"status": "up", "circuit_breaker": "closed"}, "deepseek": {"status": "up", "circuit_breaker": "closed"} }, "metrics": { "total_requests": 15420, "error_rate": 0.02, "p50_latency_ms": 245, "p95_latency_ms": 890, "p99_latency_ms": 2100 }, "cost": { "today_usd": 42.50, "daily_budget_usd": 100.00, "utilization_pct": 42.5 }, "alerts": [], "rate_limits": [] }

Response fields

FieldTypeDescription
overall_statusstring"healthy", "degraded", or "unhealthy"
timestampstringISO 8601 timestamp
uptime_secondsfloatSeconds since gateway process started
componentsobjectPer-component status (gateway, redis, mongodb)
providersobjectPer-provider status with circuit breaker info
metricsobjectRequest count, error rate, latency percentiles
costobjectToday’s cost vs daily budget
alertsarrayTriggered alert conditions
rate_limitsarrayTop 5 busiest organizations by current RPM

HTTP status codes

CodeCondition
200Healthy or degraded (still operational)
503Unhealthy (all providers down)

Check approval status

GET /v1/approvals/{approval_id}/status

Poll the status of a HITL approval request. Called by gateway callers who received a 202 response with an approval_id. Requires gateway API key authentication.

Path parameters

ParameterTypeDescription
approval_idstringThe approval ID from the 202 response

Headers

HeaderRequiredDescription
X-CM-API-KeyYesCurate-Me gateway API key

Response

{ "approval_id": "apr_abc123def456", "status": "pending", "org_id": "org_xyz789", "model": "gpt-4o", "estimated_cost": 12.50, "created_at": "2026-02-27T12:00:00Z" }

Status values

StatusDescription
pendingAwaiting human review
approvedRequest approved — re-submit to proceed
rejectedRequest rejected by a reviewer
expiredApproval request timed out

Error (not found)

HTTP/1.1 404 Not Found
{ "error": { "message": "Approval request not found: apr_abc123def456", "type": "not_found_error", "param": "approval_id", "code": "not_found" } }

Admin endpoints

The following endpoints are used by the dashboard to manage gateway configuration. They require admin authentication (dashboard JWT or API key with admin scopes).

Governance policies

MethodEndpointDescription
GET/gateway/admin/policiesList governance policies for the org
POST/gateway/admin/policiesCreate or update a governance policy
DELETE/gateway/admin/policies/{org_id}Delete a governance policy
POST/gateway/admin/policies/apply-presetApply a policy preset

Usage and billing

MethodEndpointDescription
GET/gateway/admin/usageGet usage statistics (paginated)
GET/gateway/admin/usage/dailyGet daily cost breakdown
GET/gateway/admin/usage/{request_id}Get a single usage record
GET/gateway/admin/usage/exportExport usage records as CSV
GET/gateway/admin/billing/summaryMonthly billing summary
GET/gateway/admin/billing/exportMonthly billing export (JSON/CSV)

Provider secrets

MethodEndpointDescription
POST/gateway/admin/secretsStore a provider API key (encrypted)
GET/gateway/admin/secretsList stored secrets (metadata only, keys not exposed)
POST/gateway/admin/secrets/rotateRotate a provider secret
DELETE/gateway/admin/secrets/{provider}Revoke a provider secret

Provider targets

MethodEndpointDescription
POST/gateway/admin/provider-targetsCreate a provider target (e.g., Azure OpenAI, Ollama)
GET/gateway/admin/provider-targetsList provider targets for the org
GET/gateway/admin/provider-targets/{id}Get a single provider target
PATCH/gateway/admin/provider-targets/{id}Update a provider target
DELETE/gateway/admin/provider-targets/{id}Delete a provider target
POST/gateway/admin/provider-targets/{id}/discover-modelsDiscover models from the target (SSRF-safe)
GET/gateway/admin/provider-targets/{id}/catalogList discovered models

Model aliases

MethodEndpointDescription
POST/gateway/admin/model-aliasesCreate a model alias
GET/gateway/admin/model-aliasesList model aliases for the org
GET/gateway/admin/model-aliases/{id}Get a single model alias
PATCH/gateway/admin/model-aliases/{id}Update a model alias
DELETE/gateway/admin/model-aliases/{id}Delete a model alias

HITL approvals

MethodEndpointDescription
GET/gateway/admin/approvalsList pending approvals (paginated)
GET/gateway/admin/approvals/statsCounts of pending/approved/rejected
GET/gateway/admin/approvals/{id}Get full approval details
POST/gateway/admin/approvals/{id}/approveApprove a pending request
POST/gateway/admin/approvals/{id}/rejectReject a pending request

Root endpoint

GET /

Returns service identification and a list of available endpoints. No authentication required.

{ "service": "Curate-Me AI Gateway", "version": "1.0.0", "status": "running", "docs": "/docs", "health": "/v1/health", "liveness": "/health", "metrics": "/metrics", "endpoints": { "openai_proxy": "POST /v1/chat/completions", "anthropic_proxy": "POST /v1/messages", "google_proxy": "POST /v1/google/chat/completions", "models": "GET /v1/models", "admin_policies": "GET/POST /gateway/admin/policies", "admin_usage": "GET /gateway/admin/usage", "admin_approvals": "GET /gateway/admin/approvals", "providers_health": "GET /v1/providers/health" } }

Prometheus metrics

GET /metrics

Prometheus-compatible metrics endpoint. Returns gateway metrics in Prometheus exposition format. No authentication required.

Tracked metrics include:

  • Request count by provider, model, and status code
  • Request latency histograms
  • Governance block counts by reason
  • Cost accumulation by provider
  • Circuit breaker state changes