Proxy Endpoints

The gateway proxy endpoints accept requests in the same format as the upstream LLM provider and forward them through the governance chain. Responses are passed through unchanged, including streaming SSE responses.

All proxy endpoints require authentication via X-CM-API-Key and a provider API key. See Authentication.

OpenAI Chat Completions


POST /v1/chat/completions

Proxies requests to OpenAI’s chat completions API. Accepts the same request body as https://api.openai.com/v1/chat/completions.

Headers

Header	Required	Description
`X-CM-API-Key`	Yes	Curate-Me gateway API key
`X-Provider-Key`	Conditional	OpenAI API key (not needed if stored via secrets)
`Authorization`	Conditional	`Bearer <openai-key>` (alternative to X-Provider-Key)
`Content-Type`	Yes	`application/json`
`Idempotency-Key`	No	Idempotency key for retry-safe calls

Request body

The request body is passed through to OpenAI. All standard OpenAI parameters are supported:


{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, world!"}
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 256
}

Field	Type	Required	Description
`model`	string	Yes	Model ID (e.g., `gpt-4o`, `gpt-4o-mini`, `gpt-5.1-chat-latest`, `o3-mini`)
`messages`	array	Yes	Conversation messages in OpenAI format
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`temperature`	number	No	Sampling temperature (0-2)
`max_tokens`	integer	No	Maximum tokens to generate

All other OpenAI parameters (top_p, tools, tool_choice, response_format, stream_options, etc.) are forwarded as-is.

Response (non-streaming)


HTTP/1.1 200 OK
Content-Type: application/json
X-CM-Request-ID: gw_a1b2c3d4e5f6g7h8
X-CM-Cost: 0.0042
X-CM-Daily-Cost: 12.35
X-CM-Daily-Budget: 100.00
X-RateLimit-Limit: 300
X-RateLimit-Remaining: 297
X-RateLimit-Reset: 1709056860


{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709056800,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

The response body is the unmodified OpenAI response. The gateway adds governance metadata via response headers only.

Response (streaming)

When stream: true is set, the response is an SSE stream:


HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
X-CM-Request-ID: gw_a1b2c3d4e5f6g7h8
X-CM-Cost: 0.0042
X-CM-Daily-Cost: 12.35
X-CM-Daily-Budget: 100.00


data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}

data: [DONE]

The gateway automatically injects stream_options: {"include_usage": true} so that token usage is reported in the final chunk for cost recording.

Example: Python (OpenAI SDK)


from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.curate-me.ai/v1/openai",
    default_headers={"X-CM-API-Key": "cm_sk_xxx"},
)
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Example: cURL


curl https://api.curate-me.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "X-CM-API-Key: cm_sk_xxx" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Anthropic Messages


POST /v1/messages

Proxies requests to Anthropic’s messages API. Accepts the same request body as https://api.anthropic.com/v1/messages.

Headers

Header	Required	Description
`X-CM-API-Key`	Yes	Curate-Me gateway API key
`X-Provider-Key`	Conditional	Anthropic API key (not needed if stored via secrets)
`Authorization`	Conditional	`Bearer <anthropic-key>` (alternative to X-Provider-Key)
`Content-Type`	Yes	`application/json`

Request body


{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "Hello, Claude!"}
  ]
}

Field	Type	Required	Description
`model`	string	Yes	Model ID (e.g., `claude-sonnet-4-20250514`, `claude-opus-4-20250514`, `claude-haiku-3-5-20241022`)
`max_tokens`	integer	Yes	Maximum tokens to generate
`messages`	array	Yes	Conversation messages in Anthropic format
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`system`	string	No	System prompt

All other Anthropic parameters (temperature, tools, tool_choice, metadata, etc.) are forwarded as-is.

Response

The response is the unmodified Anthropic response:


{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 12
  }
}

Provider auth translation

The gateway automatically translates your provider key to Anthropic’s expected format:

Sets the x-api-key header (Anthropic’s auth method)
Sets anthropic-version: 2023-06-01

You do not need to handle this translation in your application.

Example: Python (Anthropic SDK)


import anthropic
 
client = anthropic.Anthropic(
    base_url="https://api.curate-me.ai/v1/anthropic",
    default_headers={"X-CM-API-Key": "cm_sk_xxx"},
)
 
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude!"}],
)
print(message.content[0].text)

Google Gemini Chat Completions


POST /v1/google/chat/completions

Proxies requests to Google Gemini using an OpenAI-compatible request format. The gateway translates the request from OpenAI chat completions format to Gemini’s generateContent format, proxies it to Google, and translates the response back to OpenAI format.

This means you can use the same OpenAI SDK to call Gemini models through the gateway.

Headers

Header	Required	Description
`X-CM-API-Key`	Yes	Curate-Me gateway API key
`X-Provider-Key`	Conditional	Google API key (not needed if stored via secrets)
`Content-Type`	Yes	`application/json`

Request body

Use the same OpenAI chat completions format:


{
  "model": "gemini-2.5-flash",
  "messages": [
    {"role": "user", "content": "Explain quantum computing briefly."}
  ],
  "stream": false,
  "max_tokens": 256
}

Field	Type	Required	Description
`model`	string	Yes	Model ID (e.g., `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.0-flash`)
`messages`	array	Yes	Conversation messages in OpenAI format
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`max_tokens`	integer	No	Maximum tokens to generate

Response

The response is translated to OpenAI chat completions format:


{
  "id": "gw_a1b2c3d4e5f6g7h8",
  "object": "chat.completion",
  "created": 1709056800,
  "model": "gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum mechanical phenomena..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 45,
    "total_tokens": 53
  }
}

Streaming

When stream: true, the gateway translates Gemini’s streaming format to OpenAI SSE chunks, including the data: [DONE] terminator.

Provider auth translation

Google Gemini uses a query parameter for authentication (?key=API_KEY). The gateway handles this automatically — you pass your Google API key the same way as any other provider.

Circuit breaker

The Google proxy has its own circuit breaker. If Google returns repeated failures, the circuit breaker opens and subsequent requests are rejected immediately with 503 Service Unavailable:


{
  "error": {
    "message": "Google Gemini is temporarily unavailable (circuit breaker open)",
    "type": "service_unavailable",
    "param": null,
    "code": "circuit_breaker_open"
  }
}

Example: Python (OpenAI SDK)


from openai import OpenAI
 
# Use Google Gemini via the OpenAI SDK
client = OpenAI(
    base_url="https://api.curate-me.ai/v1/google",
    default_headers={"X-CM-API-Key": "cm_sk_xxx"},
    api_key="your-google-api-key",  # passed as provider key
)
 
response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Hello from Gemini!"}],
)
print(response.choices[0].message.content)

DeepSeek Chat Completions


POST /v1/deepseek/chat/completions

Proxies requests to DeepSeek’s API. DeepSeek uses an OpenAI-compatible API format, so requests and responses pass through with minimal transformation.

Headers

Header	Required	Description
`X-CM-API-Key`	Yes	Curate-Me gateway API key
`X-Provider-Key`	Conditional	DeepSeek API key (not needed if stored via secrets)
`Authorization`	Conditional	`Bearer <deepseek-key>` (alternative to X-Provider-Key)
`Content-Type`	Yes	`application/json`

Request body


{
  "model": "deepseek-chat",
  "messages": [
    {"role": "user", "content": "What is deep learning?"}
  ],
  "stream": false
}

Field	Type	Required	Description
`model`	string	Yes	Model ID: `deepseek-chat`, `deepseek-reasoner`, or `deepseek-coder`
`messages`	array	Yes	Conversation messages in OpenAI format
`stream`	boolean	No	Enable SSE streaming (default: `false`)

All other OpenAI-compatible parameters are forwarded as-is.

Response

The response is the unmodified DeepSeek response (OpenAI-compatible format):


{
  "id": "chatcmpl-xyz789",
  "object": "chat.completion",
  "created": 1709056800,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Deep learning is a subset of machine learning..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 50,
    "total_tokens": 57
  }
}

Provider key resolution

DeepSeek API key is resolved in this order:

X-Provider-Key header
Authorization: Bearer header (if not a gateway key)
Org-scoped secret custody (stored via dashboard)
DEEPSEEK_API_KEY environment variable

Catch-all route

In addition to /v1/deepseek/chat/completions, the gateway includes a catch-all route ANY /v1/deepseek/{path} that proxies any DeepSeek API endpoint. This is not included in the public API schema but is available for forward compatibility.

Example: Python (OpenAI SDK)


from openai import OpenAI
 
# DeepSeek uses OpenAI-compatible format
client = OpenAI(
    base_url="https://api.curate-me.ai/v1/deepseek",
    default_headers={"X-CM-API-Key": "cm_sk_xxx"},
    api_key="your-deepseek-key",
)
 
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello from DeepSeek!"}],
)
print(response.choices[0].message.content)

List Models


GET /v1/models

Returns a list of all models available through the gateway. Requires gateway authentication but does not require a provider key.

Headers

Header	Required	Description
`X-CM-API-Key`	Yes	Curate-Me gateway API key (requires `gateway:read` scope)

Response


{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "created": 0,
      "owned_by": "openai",
      "description": "GPT-4o (latest)"
    },
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "created": 0,
      "owned_by": "openai",
      "description": "GPT-4o Mini (budget)"
    },
    {
      "id": "claude-sonnet-4-20250514",
      "object": "model",
      "created": 0,
      "owned_by": "anthropic",
      "description": "Claude Sonnet 4"
    },
    {
      "id": "gemini-2.5-pro",
      "object": "model",
      "created": 0,
      "owned_by": "google",
      "description": "Gemini 2.5 Pro"
    },
    {
      "id": "deepseek-chat",
      "object": "model",
      "created": 0,
      "owned_by": "deepseek",
      "description": "DeepSeek Chat (V3)"
    }
  ]
}

The response follows the OpenAI model list format for SDK compatibility.

Supported models

OpenAI

Model ID	Description
`gpt-4o`	GPT-4o (latest)
`gpt-4o-mini`	GPT-4o Mini (budget)
`gpt-5.1-chat-latest`	GPT-5.1 (latest)
`gpt-5-mini`	GPT-5 Mini
`gpt-5-nano`	GPT-5 Nano (ultra-budget)
`o1`	o1 (reasoning)
`o1-mini`	o1-mini (budget reasoning)
`o3`	o3 (advanced reasoning)
`o3-mini`	o3-mini (budget reasoning)

Anthropic

Model ID	Description
`claude-opus-4-20250514`	Claude Opus 4
`claude-sonnet-4-5-20250929`	Claude Sonnet 4.5
`claude-sonnet-4-20250514`	Claude Sonnet 4
`claude-haiku-3-5-20241022`	Claude 3.5 Haiku

Google Gemini

Model ID	Description
`gemini-2.5-pro`	Gemini 2.5 Pro
`gemini-2.5-flash`	Gemini 2.5 Flash
`gemini-2.0-flash`	Gemini 2.0 Flash
`gemini-1.5-pro`	Gemini 1.5 Pro
`gemini-1.5-flash`	Gemini 1.5 Flash

DeepSeek

Model ID	Description
`deepseek-chat`	DeepSeek Chat (V3)
`deepseek-reasoner`	DeepSeek Reasoner (R1)
`deepseek-coder`	DeepSeek Coder

Model aliases

The gateway ships with built-in convenience aliases:

Alias	Resolves to
`gpt-4`	`gpt-4o`
`gpt-4-turbo`	`gpt-4o`
`claude-3`	`claude-sonnet-4-20250514`
`claude-3.5-sonnet`	`claude-sonnet-4-20250514`
`claude-sonnet`	`claude-sonnet-4-20250514`
`claude-opus`	`claude-opus-4-20250514`
`claude-haiku`	`claude-haiku-3-5-20241022`
`gemini-pro`	`gemini-2.5-pro`
`gemini-flash`	`gemini-2.5-flash`
`deepseek`	`deepseek-chat`
`deepseek-r1`	`deepseek-reasoner`

Aliases are resolved before governance checks. Organizations can define custom aliases via POST /gateway/admin/model-aliases.

Provider auto-detection

When using the generic /v1/chat/completions endpoint, the provider is detected from the model name prefix:

Prefix	Provider
`gpt-`, `o1-`, `o3-`, `o4-`, `chatgpt-`, `ft:gpt-`	OpenAI
`claude-*`	Anthropic
`gemini-*`	Google
`deepseek-*`	DeepSeek

Provider mismatch error

If a model name does not match the endpoint’s expected provider, the gateway returns a 400 error:


{
  "error": {
    "message": "Model 'gpt-4o' is a openai model but was sent to the google endpoint. Use the correct endpoint.",
    "type": "invalid_request_error",
    "param": null,
    "code": "provider_mismatch"
  }
}

Request size limit

All proxy endpoints enforce a maximum request body size of 10 MB. Requests exceeding this limit receive a 413 error:


{
  "error": {
    "message": "Request body too large. Maximum size is 10485760 bytes (10 MB).",
    "type": "invalid_request_error",
    "param": null,
    "code": "request_too_large"
  }
}