Skip to Content
GatewayAPI ReferenceProxy Endpoints

Proxy Endpoints

The gateway proxy endpoints accept requests in the same format as the upstream LLM provider and forward them through the governance chain. Responses are passed through unchanged, including streaming SSE responses.

All proxy endpoints require authentication via X-CM-API-Key and a provider API key. See Authentication.


OpenAI Chat Completions

POST /v1/chat/completions

Proxies requests to OpenAI’s chat completions API. Accepts the same request body as https://api.openai.com/v1/chat/completions.

Headers

HeaderRequiredDescription
X-CM-API-KeyYesCurate-Me gateway API key
X-Provider-KeyConditionalOpenAI API key (not needed if stored via secrets)
AuthorizationConditionalBearer <openai-key> (alternative to X-Provider-Key)
Content-TypeYesapplication/json
Idempotency-KeyNoIdempotency key for retry-safe calls

Request body

The request body is passed through to OpenAI. All standard OpenAI parameters are supported:

{ "model": "gpt-4o", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, world!"} ], "stream": false, "temperature": 0.7, "max_tokens": 256 }
FieldTypeRequiredDescription
modelstringYesModel ID (e.g., gpt-4o, gpt-4o-mini, gpt-5.1-chat-latest, o3-mini)
messagesarrayYesConversation messages in OpenAI format
streambooleanNoEnable SSE streaming (default: false)
temperaturenumberNoSampling temperature (0-2)
max_tokensintegerNoMaximum tokens to generate

All other OpenAI parameters (top_p, tools, tool_choice, response_format, stream_options, etc.) are forwarded as-is.

Response (non-streaming)

HTTP/1.1 200 OK Content-Type: application/json X-CM-Request-ID: gw_a1b2c3d4e5f6g7h8 X-CM-Cost: 0.0042 X-CM-Daily-Cost: 12.35 X-CM-Daily-Budget: 100.00 X-RateLimit-Limit: 300 X-RateLimit-Remaining: 297 X-RateLimit-Reset: 1709056860
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1709056800, "model": "gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 9, "total_tokens": 21 } }

The response body is the unmodified OpenAI response. The gateway adds governance metadata via response headers only.

Response (streaming)

When stream: true is set, the response is an SSE stream:

HTTP/1.1 200 OK Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive X-CM-Request-ID: gw_a1b2c3d4e5f6g7h8 X-CM-Cost: 0.0042 X-CM-Daily-Cost: 12.35 X-CM-Daily-Budget: 100.00
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]} data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]} data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709056800,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]} data: [DONE]

The gateway automatically injects stream_options: {"include_usage": true} so that token usage is reported in the final chunk for cost recording.

Example: Python (OpenAI SDK)

from openai import OpenAI client = OpenAI( base_url="https://api.curate-me.ai/v1/openai", default_headers={"X-CM-API-Key": "cm_sk_xxx"}, ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}], ) print(response.choices[0].message.content)

Example: cURL

curl https://api.curate-me.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "X-CM-API-Key: cm_sk_xxx" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}] }'

Anthropic Messages

POST /v1/messages

Proxies requests to Anthropic’s messages API. Accepts the same request body as https://api.anthropic.com/v1/messages.

Headers

HeaderRequiredDescription
X-CM-API-KeyYesCurate-Me gateway API key
X-Provider-KeyConditionalAnthropic API key (not needed if stored via secrets)
AuthorizationConditionalBearer <anthropic-key> (alternative to X-Provider-Key)
Content-TypeYesapplication/json

Request body

{ "model": "claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [ {"role": "user", "content": "Hello, Claude!"} ] }
FieldTypeRequiredDescription
modelstringYesModel ID (e.g., claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku-3-5-20241022)
max_tokensintegerYesMaximum tokens to generate
messagesarrayYesConversation messages in Anthropic format
streambooleanNoEnable SSE streaming (default: false)
systemstringNoSystem prompt

All other Anthropic parameters (temperature, tools, tool_choice, metadata, etc.) are forwarded as-is.

Response

The response is the unmodified Anthropic response:

{ "id": "msg_abc123", "type": "message", "role": "assistant", "content": [ { "type": "text", "text": "Hello! How can I help you today?" } ], "model": "claude-sonnet-4-20250514", "stop_reason": "end_turn", "usage": { "input_tokens": 10, "output_tokens": 12 } }

Provider auth translation

The gateway automatically translates your provider key to Anthropic’s expected format:

  • Sets the x-api-key header (Anthropic’s auth method)
  • Sets anthropic-version: 2023-06-01

You do not need to handle this translation in your application.

Example: Python (Anthropic SDK)

import anthropic client = anthropic.Anthropic( base_url="https://api.curate-me.ai/v1/anthropic", default_headers={"X-CM-API-Key": "cm_sk_xxx"}, ) message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Hello, Claude!"}], ) print(message.content[0].text)

Google Gemini Chat Completions

POST /v1/google/chat/completions

Proxies requests to Google Gemini using an OpenAI-compatible request format. The gateway translates the request from OpenAI chat completions format to Gemini’s generateContent format, proxies it to Google, and translates the response back to OpenAI format.

This means you can use the same OpenAI SDK to call Gemini models through the gateway.

Headers

HeaderRequiredDescription
X-CM-API-KeyYesCurate-Me gateway API key
X-Provider-KeyConditionalGoogle API key (not needed if stored via secrets)
Content-TypeYesapplication/json

Request body

Use the same OpenAI chat completions format:

{ "model": "gemini-2.5-flash", "messages": [ {"role": "user", "content": "Explain quantum computing briefly."} ], "stream": false, "max_tokens": 256 }
FieldTypeRequiredDescription
modelstringYesModel ID (e.g., gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash)
messagesarrayYesConversation messages in OpenAI format
streambooleanNoEnable SSE streaming (default: false)
max_tokensintegerNoMaximum tokens to generate

Response

The response is translated to OpenAI chat completions format:

{ "id": "gw_a1b2c3d4e5f6g7h8", "object": "chat.completion", "created": 1709056800, "model": "gemini-2.5-flash", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Quantum computing uses quantum mechanical phenomena..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 8, "completion_tokens": 45, "total_tokens": 53 } }

Streaming

When stream: true, the gateway translates Gemini’s streaming format to OpenAI SSE chunks, including the data: [DONE] terminator.

Provider auth translation

Google Gemini uses a query parameter for authentication (?key=API_KEY). The gateway handles this automatically — you pass your Google API key the same way as any other provider.

Circuit breaker

The Google proxy has its own circuit breaker. If Google returns repeated failures, the circuit breaker opens and subsequent requests are rejected immediately with 503 Service Unavailable:

{ "error": { "message": "Google Gemini is temporarily unavailable (circuit breaker open)", "type": "service_unavailable", "param": null, "code": "circuit_breaker_open" } }

Example: Python (OpenAI SDK)

from openai import OpenAI # Use Google Gemini via the OpenAI SDK client = OpenAI( base_url="https://api.curate-me.ai/v1/google", default_headers={"X-CM-API-Key": "cm_sk_xxx"}, api_key="your-google-api-key", # passed as provider key ) response = client.chat.completions.create( model="gemini-2.5-flash", messages=[{"role": "user", "content": "Hello from Gemini!"}], ) print(response.choices[0].message.content)

DeepSeek Chat Completions

POST /v1/deepseek/chat/completions

Proxies requests to DeepSeek’s API. DeepSeek uses an OpenAI-compatible API format, so requests and responses pass through with minimal transformation.

Headers

HeaderRequiredDescription
X-CM-API-KeyYesCurate-Me gateway API key
X-Provider-KeyConditionalDeepSeek API key (not needed if stored via secrets)
AuthorizationConditionalBearer <deepseek-key> (alternative to X-Provider-Key)
Content-TypeYesapplication/json

Request body

{ "model": "deepseek-chat", "messages": [ {"role": "user", "content": "What is deep learning?"} ], "stream": false }
FieldTypeRequiredDescription
modelstringYesModel ID: deepseek-chat, deepseek-reasoner, or deepseek-coder
messagesarrayYesConversation messages in OpenAI format
streambooleanNoEnable SSE streaming (default: false)

All other OpenAI-compatible parameters are forwarded as-is.

Response

The response is the unmodified DeepSeek response (OpenAI-compatible format):

{ "id": "chatcmpl-xyz789", "object": "chat.completion", "created": 1709056800, "model": "deepseek-chat", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Deep learning is a subset of machine learning..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 7, "completion_tokens": 50, "total_tokens": 57 } }

Provider key resolution

DeepSeek API key is resolved in this order:

  1. X-Provider-Key header
  2. Authorization: Bearer header (if not a gateway key)
  3. Org-scoped secret custody (stored via dashboard)
  4. DEEPSEEK_API_KEY environment variable

Catch-all route

In addition to /v1/deepseek/chat/completions, the gateway includes a catch-all route ANY /v1/deepseek/{path} that proxies any DeepSeek API endpoint. This is not included in the public API schema but is available for forward compatibility.

Example: Python (OpenAI SDK)

from openai import OpenAI # DeepSeek uses OpenAI-compatible format client = OpenAI( base_url="https://api.curate-me.ai/v1/deepseek", default_headers={"X-CM-API-Key": "cm_sk_xxx"}, api_key="your-deepseek-key", ) response = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Hello from DeepSeek!"}], ) print(response.choices[0].message.content)

List Models

GET /v1/models

Returns a list of all models available through the gateway. Requires gateway authentication but does not require a provider key.

Headers

HeaderRequiredDescription
X-CM-API-KeyYesCurate-Me gateway API key (requires gateway:read scope)

Response

{ "object": "list", "data": [ { "id": "gpt-4o", "object": "model", "created": 0, "owned_by": "openai", "description": "GPT-4o (latest)" }, { "id": "gpt-4o-mini", "object": "model", "created": 0, "owned_by": "openai", "description": "GPT-4o Mini (budget)" }, { "id": "claude-sonnet-4-20250514", "object": "model", "created": 0, "owned_by": "anthropic", "description": "Claude Sonnet 4" }, { "id": "gemini-2.5-pro", "object": "model", "created": 0, "owned_by": "google", "description": "Gemini 2.5 Pro" }, { "id": "deepseek-chat", "object": "model", "created": 0, "owned_by": "deepseek", "description": "DeepSeek Chat (V3)" } ] }

The response follows the OpenAI model list format for SDK compatibility.

Supported models

OpenAI

Model IDDescription
gpt-4oGPT-4o (latest)
gpt-4o-miniGPT-4o Mini (budget)
gpt-5.1-chat-latestGPT-5.1 (latest)
gpt-5-miniGPT-5 Mini
gpt-5-nanoGPT-5 Nano (ultra-budget)
o1o1 (reasoning)
o1-minio1-mini (budget reasoning)
o3o3 (advanced reasoning)
o3-minio3-mini (budget reasoning)

Anthropic

Model IDDescription
claude-opus-4-20250514Claude Opus 4
claude-sonnet-4-5-20250929Claude Sonnet 4.5
claude-sonnet-4-20250514Claude Sonnet 4
claude-haiku-3-5-20241022Claude 3.5 Haiku

Google Gemini

Model IDDescription
gemini-2.5-proGemini 2.5 Pro
gemini-2.5-flashGemini 2.5 Flash
gemini-2.0-flashGemini 2.0 Flash
gemini-1.5-proGemini 1.5 Pro
gemini-1.5-flashGemini 1.5 Flash

DeepSeek

Model IDDescription
deepseek-chatDeepSeek Chat (V3)
deepseek-reasonerDeepSeek Reasoner (R1)
deepseek-coderDeepSeek Coder

Model aliases

The gateway ships with built-in convenience aliases:

AliasResolves to
gpt-4gpt-4o
gpt-4-turbogpt-4o
claude-3claude-sonnet-4-20250514
claude-3.5-sonnetclaude-sonnet-4-20250514
claude-sonnetclaude-sonnet-4-20250514
claude-opusclaude-opus-4-20250514
claude-haikuclaude-haiku-3-5-20241022
gemini-progemini-2.5-pro
gemini-flashgemini-2.5-flash
deepseekdeepseek-chat
deepseek-r1deepseek-reasoner

Aliases are resolved before governance checks. Organizations can define custom aliases via POST /gateway/admin/model-aliases.

Provider auto-detection

When using the generic /v1/chat/completions endpoint, the provider is detected from the model name prefix:

PrefixProvider
gpt-*, o1-*, o3-*, o4-*, chatgpt-*, ft:gpt-*OpenAI
claude-*Anthropic
gemini-*Google
deepseek-*DeepSeek

Provider mismatch error

If a model name does not match the endpoint’s expected provider, the gateway returns a 400 error:

{ "error": { "message": "Model 'gpt-4o' is a openai model but was sent to the google endpoint. Use the correct endpoint.", "type": "invalid_request_error", "param": null, "code": "provider_mismatch" } }

Request size limit

All proxy endpoints enforce a maximum request body size of 10 MB. Requests exceeding this limit receive a 413 error:

{ "error": { "message": "Request body too large. Maximum size is 10485760 bytes (10 MB).", "type": "invalid_request_error", "param": null, "code": "request_too_large" } }