How to Set a $10/Day Budget Cap for Your AI Agents

Published March 17, 2026

An AI agent without a budget cap is a credit card without a limit. One stuck loop, one unexpectedly large context window, one misconfigured prompt — and your LLM bill can spike from dollars to hundreds of dollars in minutes.

This tutorial shows you how to set a $10/day budget cap using the Curate-Me gateway. The cap applies to all LLM calls from your agents, across all providers (OpenAI, Anthropic, Google, DeepSeek, and 50+ others). Setup takes under five minutes. No code changes required.

How It Works

The Curate-Me gateway sits between your agents and LLM providers. Every request passes through a governance chain — a 5-step policy pipeline that evaluates the request before it reaches the provider:

Rate limiting — requests per minute per org/key
Cost estimation — estimated cost vs. per-request and daily budget
PII scanning — blocks secrets and PII before they reach providers
Model allowlist — enforces which models can be used
HITL gate — flags high-cost requests for human approval

The cost estimation step is where budget enforcement happens. Before proxying a request, the gateway estimates its cost based on the model, input tokens, and expected output tokens. It then checks:

Will this single request exceed the per-request cost limit?
Will this request push the org’s daily spend over the daily budget?

If either check fails, the request is denied with a 429 status code and a clear error message explaining why. The agent never reaches the LLM provider, so you are never charged.

The cost tracking itself uses Redis for real-time accumulation (sub-millisecond reads) and MongoDB for audit persistence. Every request’s actual cost is recorded after the response completes, including prompt tokens, completion tokens, reasoning tokens, and cache tokens. The daily accumulator resets at midnight UTC.

Prerequisites

A Curate-Me account (sign up free )
Your API key (cm_sk_xxx) from the dashboard
An existing application that makes LLM API calls

Step 1: Point Your SDK at the Gateway

Change your base URL to route through Curate-Me. This is a one-line change in your environment configuration:


# Before (direct to OpenAI):
OPENAI_BASE_URL=https://api.openai.com/v1
 
# After (through Curate-Me gateway):
OPENAI_BASE_URL=https://api.curate-me.ai/v1/openai

Add your Curate-Me API key as a header. Here is how it looks in code:

Python (OpenAI SDK)


from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.curate-me.ai/v1/openai",
    default_headers={"X-CM-API-Key": "cm_sk_xxx"},
)
 
# All calls now flow through the governance chain
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this document."}],
)

TypeScript (OpenAI SDK)


import OpenAI from 'openai'
 
const client = new OpenAI({
  baseURL: 'https://api.curate-me.ai/v1/openai',
  defaultHeaders: { 'X-CM-API-Key': 'cm_sk_xxx' },
})
 
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Summarize this document.' }],
})

Anthropic, Google, DeepSeek

The gateway supports 50+ providers. Swap the provider segment in the URL:


# Anthropic
ANTHROPIC_BASE_URL=https://api.curate-me.ai/v1/anthropic
 
# Google (Gemini)
GOOGLE_BASE_URL=https://api.curate-me.ai/v1/google
 
# DeepSeek
DEEPSEEK_BASE_URL=https://api.curate-me.ai/v1/deepseek

The same X-CM-API-Key header works for all providers. One key, one budget, all providers.

Step 2: Set the Daily Budget

Option A: Dashboard UI

Open the Curate-Me dashboard
Navigate to Settings > Governance Policies
Set Daily Budget to $10.00
Set Per-Request Limit to $1.00 (recommended — prevents any single request from consuming the entire budget)
Click Save

Option B: API / curl


curl -X PUT https://api.curate-me.ai/api/v1/admin/governance/policy \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_budget": 10.00,
    "per_request_limit": 1.00,
    "rate_limit_rpm": 100
  }'

Option C: Python SDK


from curate_me import CurateMe
 
client = CurateMe(api_key="cm_sk_xxx")
 
client.policies.update(
    daily_budget=10.00,
    per_request_limit=1.00,
    rate_limit_rpm=100,
)

Option D: TypeScript SDK


import { CurateMe } from '@curate-me/sdk'
 
const client = new CurateMe({ apiKey: 'cm_sk_xxx' })
 
await client.policies.update({
  dailyBudget: 10.0,
  perRequestLimit: 1.0,
  rateLimitRpm: 100,
})

Step 3: Verify It Works

Make a test request and confirm the governance chain is active:


curl https://api.curate-me.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer your-openai-key" \
  -H "X-CM-API-Key: cm_sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "ping"}]
  }'

If the response comes back normally, your requests are flowing through the governance chain. Check the dashboard’s Activity feed — you should see the request logged with its cost, model, and latency.

To test the budget enforcement, you can temporarily set a very low budget (e.g., $0.01) and send a few requests. Once the budget is exhausted, you will receive a 429 response:


{
  "error": {
    "message": "Daily budget exceeded. Spent $0.012 of $0.010 daily budget.",
    "type": "rate_limit_error",
    "code": "daily_budget_exceeded"
  }
}

Set the budget back to $10.00 after testing.

Step 4: Configure Alerts (Optional)

The gateway automatically fires a webhook alert when your org reaches 80% of its daily budget. You can configure a webhook URL to receive these alerts:

Open the dashboard and navigate to Settings > Webhooks
Add a webhook URL (Slack incoming webhook, PagerDuty, email relay, or any HTTP endpoint)
Enable the budget.warning event type

The webhook payload includes your org ID, current daily spend, daily budget, and percentage used. This gives you time to investigate before the budget is fully exhausted.

What Gets Tracked

The cost recorder tracks more than just input and output tokens. Here is everything that factors into the cost calculation:

Token Type	Description
Prompt tokens	Input tokens in your request
Completion tokens	Output tokens in the response
Reasoning tokens	Tokens used by reasoning models (o1, o3)
Thinking tokens	Extended thinking tokens (Claude)
Cache creation tokens	Tokens written to prompt cache
Cache read tokens	Tokens read from prompt cache (discounted)

All token types are recorded in the audit log with per-token-type costs. The daily accumulator sums the actual cost (not estimated) so your budget enforcement reflects real spend.

Budget Hierarchy: Org, Team, and Key Levels

The $10/day budget we set above applies at the org level. For more granular control, you can set budgets at three levels:


Org Budget ($10/day)
  |
  +-- Team A Budget ($5/day)
  |     |
  |     +-- API Key 1 Budget ($2/day)
  |     +-- API Key 2 Budget ($3/day)
  |
  +-- Team B Budget ($5/day)
        |
        +-- API Key 3 Budget ($5/day)

Budgets cascade. A request from API Key 1 checks against the key’s $2 limit, then Team A’s $5 limit, then the org’s $10 limit. If any level is exceeded, the request is denied.

Set team and key budgets via the dashboard under Settings > Budgets > Hierarchy, or via the API:


# Set a team budget
curl -X PUT https://api.curate-me.ai/api/v1/admin/budgets/team/team_abc \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"daily_budget": 5.00}'
 
# Set an API key budget
curl -X PUT https://api.curate-me.ai/api/v1/admin/budgets/key/key_xyz \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"daily_budget": 2.00}'

Real-Time Cost Monitoring

Once your budget is set, the dashboard gives you real-time visibility into spend:

Costs page: Daily and monthly spend breakdown by model, provider, and API key
Activity feed: Every request logged with cost, tokens, latency, and governance decisions
WebSocket updates: Real-time cost events pushed to the dashboard as requests complete
Usage API: Programmatic access to cost data for your own dashboards and alerts


# Get today's cost summary
curl https://api.curate-me.ai/api/v1/admin/costs/today \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Response:


{
  "date": "2026-03-17",
  "total_cost": 4.23,
  "total_requests": 847,
  "daily_budget": 10.00,
  "budget_remaining": 5.77,
  "by_model": {
    "gpt-4o": 2.89,
    "claude-sonnet-4-6": 1.12,
    "gpt-4o-mini": 0.22
  }
}

Common Budget Configurations

Here are budget setups for common scenarios:

Scenario	Daily Budget	Per-Request Limit	Rate Limit	Notes
Solo developer, side project	$5/day	$0.50	60 RPM	Tight controls for cost-conscious dev
Small team, production	$25/day	$2.00	200 RPM	Room for normal usage with safety margin
Growth stage, multiple agents	$100/day	$5.00	500 RPM	Higher limits with HITL on expensive requests
Enterprise, compliance-critical	$500/day	$10.00	1000 RPM	High throughput with full audit trail

For all scenarios, we recommend enabling the HITL gate for requests above 50% of your per-request limit. This catches the edge cases — an agent that accidentally sends a 100K-token context window, or a loop that makes the same expensive call repeatedly.

Summary

Setting a budget cap takes four steps:

Point your SDK at https://api.curate-me.ai/v1/{provider}
Set daily_budget: 10.00 in your governance policy
Set per_request_limit: 1.00 to cap individual requests
Optionally configure webhook alerts at 80% spend

Every request is now governed. If your agents hit the budget, they get a clear 429 error instead of an unbounded bill. The audit trail records every request with its actual cost, and the dashboard gives you real-time visibility into spend.

No agent should run without a budget. Five minutes of setup saves you from the $500 surprise on your next invoice.

Set your first budget cap at dashboard.curate-me.ai . Free tier includes 1,000 requests/day with full governance.

Curate-Me is the governance layer for AI agents. Cost caps, PII scanning, rate limiting, HITL approvals, managed runners, and a full audit trail — zero code changes.