Proxy Endpoints
The gateway proxy endpoints accept requests in the same format as the upstream LLM provider and forward them through the governance chain. Responses are passed through unchanged, including streaming SSE responses.
All proxy endpoints require authentication via X-CM-API-Key and a provider API key. See Authentication.
OpenAI Chat Completions
POST /v1/chat/completionsProxies requests to OpenAI’s chat completions API. Accepts the same request body as https://api.openai.com/v1/chat/completions.
Headers
| Header | Required | Description |
|---|---|---|
X-CM-API-Key | Yes | Curate-Me gateway API key |
X-Provider-Key | Conditional | OpenAI API key (not needed if stored via secrets) |
Authorization | Conditional | Bearer <openai-key> (alternative to X-Provider-Key) |
Content-Type | Yes | application/json |
Idempotency-Key | No | Idempotency key for retry-safe calls |
Request body
The request body is passed through to OpenAI. All standard OpenAI parameters are supported:
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, world!"}
],
"stream": false,
"temperature": 0.7,
"max_tokens": 256
}| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., gpt-4o, gpt-4o-mini, gpt-5.1-chat-latest, o3-mini) |
messages | array | Yes | Conversation messages in OpenAI format |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | Sampling temperature (0-2) |
max_tokens | integer | No | Maximum tokens to generate |
All other OpenAI parameters (top_p, tools, tool_choice, response_format, stream_options, etc.) are forwarded as-is.
Response (non-streaming)
HTTP/1.1 200 OK
Content-Type: application/json
X-CM-Request-ID: gw_a1b2c3d4e5f6g7h8
X-CM-Cost: 0.0042
X-CM-Daily-Cost: 12.35
X-CM-Daily-Budget: 100.00
X-RateLimit-Limit: 300
X-RateLimit-Remaining: 297
X-RateLimit-Reset: 1709056860{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709056800,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}The response body is the unmodified OpenAI response. The gateway adds governance metadata via response headers only.
Response (streaming)
When stream: true is set, the response is an SSE stream:
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
X-CM-Request-ID: gw_a1b2c3d4e5f6g7h8
X-CM-Cost: 0.0042
X-CM-Daily-Cost: 12.35
X-CM-Daily-Budget: 100.00data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}
data: [DONE]The gateway automatically injects stream_options: {"include_usage": true} so that token usage is reported in the final chunk for cost recording.
Example: Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.curate-me.ai/v1/openai",
default_headers={"X-CM-API-Key": "cm_sk_xxx"},
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)Example: cURL
curl https://api.curate-me.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "X-CM-API-Key: cm_sk_xxx" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'Anthropic Messages
POST /v1/messagesProxies requests to Anthropic’s messages API. Accepts the same request body as https://api.anthropic.com/v1/messages.
Headers
| Header | Required | Description |
|---|---|---|
X-CM-API-Key | Yes | Curate-Me gateway API key |
X-Provider-Key | Conditional | Anthropic API key (not needed if stored via secrets) |
Authorization | Conditional | Bearer <anthropic-key> (alternative to X-Provider-Key) |
Content-Type | Yes | application/json |
Request body
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku-3-5-20241022) |
max_tokens | integer | Yes | Maximum tokens to generate |
messages | array | Yes | Conversation messages in Anthropic format |
stream | boolean | No | Enable SSE streaming (default: false) |
system | string | No | System prompt |
All other Anthropic parameters (temperature, tools, tool_choice, metadata, etc.) are forwarded as-is.
Response
The response is the unmodified Anthropic response:
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 12
}
}Provider auth translation
The gateway automatically translates your provider key to Anthropic’s expected format:
- Sets the
x-api-keyheader (Anthropic’s auth method) - Sets
anthropic-version: 2023-06-01
You do not need to handle this translation in your application.
Example: Python (Anthropic SDK)
import anthropic
client = anthropic.Anthropic(
base_url="https://api.curate-me.ai/v1/anthropic",
default_headers={"X-CM-API-Key": "cm_sk_xxx"},
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude!"}],
)
print(message.content[0].text)Google Gemini Chat Completions
POST /v1/google/chat/completionsProxies requests to Google Gemini using an OpenAI-compatible request format. The gateway translates the request from OpenAI chat completions format to Gemini’s generateContent format, proxies it to Google, and translates the response back to OpenAI format.
This means you can use the same OpenAI SDK to call Gemini models through the gateway.
Headers
| Header | Required | Description |
|---|---|---|
X-CM-API-Key | Yes | Curate-Me gateway API key |
X-Provider-Key | Conditional | Google API key (not needed if stored via secrets) |
Content-Type | Yes | application/json |
Request body
Use the same OpenAI chat completions format:
{
"model": "gemini-2.5-flash",
"messages": [
{"role": "user", "content": "Explain quantum computing briefly."}
],
"stream": false,
"max_tokens": 256
}| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash) |
messages | array | Yes | Conversation messages in OpenAI format |
stream | boolean | No | Enable SSE streaming (default: false) |
max_tokens | integer | No | Maximum tokens to generate |
Response
The response is translated to OpenAI chat completions format:
{
"id": "gw_a1b2c3d4e5f6g7h8",
"object": "chat.completion",
"created": 1709056800,
"model": "gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum mechanical phenomena..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 45,
"total_tokens": 53
}
}Streaming
When stream: true, the gateway translates Gemini’s streaming format to OpenAI SSE chunks, including the data: [DONE] terminator.
Provider auth translation
Google Gemini uses a query parameter for authentication (?key=API_KEY). The gateway handles this automatically — you pass your Google API key the same way as any other provider.
Circuit breaker
The Google proxy has its own circuit breaker. If Google returns repeated failures, the circuit breaker opens and subsequent requests are rejected immediately with 503 Service Unavailable:
{
"error": {
"message": "Google Gemini is temporarily unavailable (circuit breaker open)",
"type": "service_unavailable",
"param": null,
"code": "circuit_breaker_open"
}
}Example: Python (OpenAI SDK)
from openai import OpenAI
# Use Google Gemini via the OpenAI SDK
client = OpenAI(
base_url="https://api.curate-me.ai/v1/google",
default_headers={"X-CM-API-Key": "cm_sk_xxx"},
api_key="your-google-api-key", # passed as provider key
)
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Hello from Gemini!"}],
)
print(response.choices[0].message.content)DeepSeek Chat Completions
POST /v1/deepseek/chat/completionsProxies requests to DeepSeek’s API. DeepSeek uses an OpenAI-compatible API format, so requests and responses pass through with minimal transformation.
Headers
| Header | Required | Description |
|---|---|---|
X-CM-API-Key | Yes | Curate-Me gateway API key |
X-Provider-Key | Conditional | DeepSeek API key (not needed if stored via secrets) |
Authorization | Conditional | Bearer <deepseek-key> (alternative to X-Provider-Key) |
Content-Type | Yes | application/json |
Request body
{
"model": "deepseek-chat",
"messages": [
{"role": "user", "content": "What is deep learning?"}
],
"stream": false
}| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID: deepseek-chat, deepseek-reasoner, or deepseek-coder |
messages | array | Yes | Conversation messages in OpenAI format |
stream | boolean | No | Enable SSE streaming (default: false) |
All other OpenAI-compatible parameters are forwarded as-is.
Response
The response is the unmodified DeepSeek response (OpenAI-compatible format):
{
"id": "chatcmpl-xyz789",
"object": "chat.completion",
"created": 1709056800,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Deep learning is a subset of machine learning..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 50,
"total_tokens": 57
}
}Provider key resolution
DeepSeek API key is resolved in this order:
X-Provider-KeyheaderAuthorization: Bearerheader (if not a gateway key)- Org-scoped secret custody (stored via dashboard)
DEEPSEEK_API_KEYenvironment variable
Catch-all route
In addition to /v1/deepseek/chat/completions, the gateway includes a catch-all route ANY /v1/deepseek/{path} that proxies any DeepSeek API endpoint. This is not included in the public API schema but is available for forward compatibility.
Example: Python (OpenAI SDK)
from openai import OpenAI
# DeepSeek uses OpenAI-compatible format
client = OpenAI(
base_url="https://api.curate-me.ai/v1/deepseek",
default_headers={"X-CM-API-Key": "cm_sk_xxx"},
api_key="your-deepseek-key",
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello from DeepSeek!"}],
)
print(response.choices[0].message.content)List Models
GET /v1/modelsReturns a list of all models available through the gateway. Requires gateway authentication but does not require a provider key.
Headers
| Header | Required | Description |
|---|---|---|
X-CM-API-Key | Yes | Curate-Me gateway API key (requires gateway:read scope) |
Response
{
"object": "list",
"data": [
{
"id": "gpt-4o",
"object": "model",
"created": 0,
"owned_by": "openai",
"description": "GPT-4o (latest)"
},
{
"id": "gpt-4o-mini",
"object": "model",
"created": 0,
"owned_by": "openai",
"description": "GPT-4o Mini (budget)"
},
{
"id": "claude-sonnet-4-20250514",
"object": "model",
"created": 0,
"owned_by": "anthropic",
"description": "Claude Sonnet 4"
},
{
"id": "gemini-2.5-pro",
"object": "model",
"created": 0,
"owned_by": "google",
"description": "Gemini 2.5 Pro"
},
{
"id": "deepseek-chat",
"object": "model",
"created": 0,
"owned_by": "deepseek",
"description": "DeepSeek Chat (V3)"
}
]
}The response follows the OpenAI model list format for SDK compatibility.
Supported models
OpenAI
| Model ID | Description |
|---|---|
gpt-4o | GPT-4o (latest) |
gpt-4o-mini | GPT-4o Mini (budget) |
gpt-5.1-chat-latest | GPT-5.1 (latest) |
gpt-5-mini | GPT-5 Mini |
gpt-5-nano | GPT-5 Nano (ultra-budget) |
o1 | o1 (reasoning) |
o1-mini | o1-mini (budget reasoning) |
o3 | o3 (advanced reasoning) |
o3-mini | o3-mini (budget reasoning) |
Anthropic
| Model ID | Description |
|---|---|
claude-opus-4-20250514 | Claude Opus 4 |
claude-sonnet-4-5-20250929 | Claude Sonnet 4.5 |
claude-sonnet-4-20250514 | Claude Sonnet 4 |
claude-haiku-3-5-20241022 | Claude 3.5 Haiku |
Google Gemini
| Model ID | Description |
|---|---|
gemini-2.5-pro | Gemini 2.5 Pro |
gemini-2.5-flash | Gemini 2.5 Flash |
gemini-2.0-flash | Gemini 2.0 Flash |
gemini-1.5-pro | Gemini 1.5 Pro |
gemini-1.5-flash | Gemini 1.5 Flash |
DeepSeek
| Model ID | Description |
|---|---|
deepseek-chat | DeepSeek Chat (V3) |
deepseek-reasoner | DeepSeek Reasoner (R1) |
deepseek-coder | DeepSeek Coder |
Model aliases
The gateway ships with built-in convenience aliases:
| Alias | Resolves to |
|---|---|
gpt-4 | gpt-4o |
gpt-4-turbo | gpt-4o |
claude-3 | claude-sonnet-4-20250514 |
claude-3.5-sonnet | claude-sonnet-4-20250514 |
claude-sonnet | claude-sonnet-4-20250514 |
claude-opus | claude-opus-4-20250514 |
claude-haiku | claude-haiku-3-5-20241022 |
gemini-pro | gemini-2.5-pro |
gemini-flash | gemini-2.5-flash |
deepseek | deepseek-chat |
deepseek-r1 | deepseek-reasoner |
Aliases are resolved before governance checks. Organizations can define custom aliases via POST /gateway/admin/model-aliases.
Provider auto-detection
When using the generic /v1/chat/completions endpoint, the provider is detected from the model name prefix:
| Prefix | Provider |
|---|---|
gpt-*, o1-*, o3-*, o4-*, chatgpt-*, ft:gpt-* | OpenAI |
claude-* | Anthropic |
gemini-* | |
deepseek-* | DeepSeek |
Provider mismatch error
If a model name does not match the endpoint’s expected provider, the gateway returns a 400 error:
{
"error": {
"message": "Model 'gpt-4o' is a openai model but was sent to the google endpoint. Use the correct endpoint.",
"type": "invalid_request_error",
"param": null,
"code": "provider_mismatch"
}
}Request size limit
All proxy endpoints enforce a maximum request body size of 10 MB. Requests exceeding this limit receive a 413 error:
{
"error": {
"message": "Request body too large. Maximum size is 10485760 bytes (10 MB).",
"type": "invalid_request_error",
"param": null,
"code": "request_too_large"
}
}