How to Set a $10/Day Budget Cap for Your AI Agents
Published March 17, 2026
An AI agent without a budget cap is a credit card without a limit. One stuck loop, one unexpectedly large context window, one misconfigured prompt — and your LLM bill can spike from dollars to hundreds of dollars in minutes.
This tutorial shows you how to set a $10/day budget cap using the Curate-Me gateway. The cap applies to all LLM calls from your agents, across all providers (OpenAI, Anthropic, Google, DeepSeek, and 50+ others). Setup takes under five minutes. No code changes required.
How It Works
The Curate-Me gateway sits between your agents and LLM providers. Every request passes through a governance chain — a 5-step policy pipeline that evaluates the request before it reaches the provider:
- Rate limiting — requests per minute per org/key
- Cost estimation — estimated cost vs. per-request and daily budget
- PII scanning — blocks secrets and PII before they reach providers
- Model allowlist — enforces which models can be used
- HITL gate — flags high-cost requests for human approval
The cost estimation step is where budget enforcement happens. Before proxying a request, the gateway estimates its cost based on the model, input tokens, and expected output tokens. It then checks:
- Will this single request exceed the per-request cost limit?
- Will this request push the org’s daily spend over the daily budget?
If either check fails, the request is denied with a 429 status code and a clear error message explaining why. The agent never reaches the LLM provider, so you are never charged.
The cost tracking itself uses Redis for real-time accumulation (sub-millisecond reads) and MongoDB for audit persistence. Every request’s actual cost is recorded after the response completes, including prompt tokens, completion tokens, reasoning tokens, and cache tokens. The daily accumulator resets at midnight UTC.
Prerequisites
- A Curate-Me account (sign up free )
- Your API key (
cm_sk_xxx) from the dashboard - An existing application that makes LLM API calls
Step 1: Point Your SDK at the Gateway
Change your base URL to route through Curate-Me. This is a one-line change in your environment configuration:
# Before (direct to OpenAI):
OPENAI_BASE_URL=https://api.openai.com/v1
# After (through Curate-Me gateway):
OPENAI_BASE_URL=https://api.curate-me.ai/v1/openaiAdd your Curate-Me API key as a header. Here is how it looks in code:
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.curate-me.ai/v1/openai",
default_headers={"X-CM-API-Key": "cm_sk_xxx"},
)
# All calls now flow through the governance chain
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this document."}],
)TypeScript (OpenAI SDK)
import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'https://api.curate-me.ai/v1/openai',
defaultHeaders: { 'X-CM-API-Key': 'cm_sk_xxx' },
})
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Summarize this document.' }],
})Anthropic, Google, DeepSeek
The gateway supports 50+ providers. Swap the provider segment in the URL:
# Anthropic
ANTHROPIC_BASE_URL=https://api.curate-me.ai/v1/anthropic
# Google (Gemini)
GOOGLE_BASE_URL=https://api.curate-me.ai/v1/google
# DeepSeek
DEEPSEEK_BASE_URL=https://api.curate-me.ai/v1/deepseekThe same X-CM-API-Key header works for all providers. One key, one budget, all providers.
Step 2: Set the Daily Budget
Option A: Dashboard UI
- Open the Curate-Me dashboard
- Navigate to Settings > Governance Policies
- Set Daily Budget to
$10.00 - Set Per-Request Limit to
$1.00(recommended — prevents any single request from consuming the entire budget) - Click Save
Option B: API / curl
curl -X PUT https://api.curate-me.ai/api/v1/admin/governance/policy \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"daily_budget": 10.00,
"per_request_limit": 1.00,
"rate_limit_rpm": 100
}'Option C: Python SDK
from curate_me import CurateMe
client = CurateMe(api_key="cm_sk_xxx")
client.policies.update(
daily_budget=10.00,
per_request_limit=1.00,
rate_limit_rpm=100,
)Option D: TypeScript SDK
import { CurateMe } from '@curate-me/sdk'
const client = new CurateMe({ apiKey: 'cm_sk_xxx' })
await client.policies.update({
dailyBudget: 10.0,
perRequestLimit: 1.0,
rateLimitRpm: 100,
})Step 3: Verify It Works
Make a test request and confirm the governance chain is active:
curl https://api.curate-me.ai/v1/openai/chat/completions \
-H "Authorization: Bearer your-openai-key" \
-H "X-CM-API-Key: cm_sk_xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "ping"}]
}'If the response comes back normally, your requests are flowing through the governance chain. Check the dashboard’s Activity feed — you should see the request logged with its cost, model, and latency.
To test the budget enforcement, you can temporarily set a very low budget (e.g., $0.01) and send a few requests. Once the budget is exhausted, you will receive a 429 response:
{
"error": {
"message": "Daily budget exceeded. Spent $0.012 of $0.010 daily budget.",
"type": "rate_limit_error",
"code": "daily_budget_exceeded"
}
}Set the budget back to $10.00 after testing.
Step 4: Configure Alerts (Optional)
The gateway automatically fires a webhook alert when your org reaches 80% of its daily budget. You can configure a webhook URL to receive these alerts:
- Open the dashboard and navigate to Settings > Webhooks
- Add a webhook URL (Slack incoming webhook, PagerDuty, email relay, or any HTTP endpoint)
- Enable the
budget.warningevent type
The webhook payload includes your org ID, current daily spend, daily budget, and percentage used. This gives you time to investigate before the budget is fully exhausted.
What Gets Tracked
The cost recorder tracks more than just input and output tokens. Here is everything that factors into the cost calculation:
| Token Type | Description |
|---|---|
| Prompt tokens | Input tokens in your request |
| Completion tokens | Output tokens in the response |
| Reasoning tokens | Tokens used by reasoning models (o1, o3) |
| Thinking tokens | Extended thinking tokens (Claude) |
| Cache creation tokens | Tokens written to prompt cache |
| Cache read tokens | Tokens read from prompt cache (discounted) |
All token types are recorded in the audit log with per-token-type costs. The daily accumulator sums the actual cost (not estimated) so your budget enforcement reflects real spend.
Budget Hierarchy: Org, Team, and Key Levels
The $10/day budget we set above applies at the org level. For more granular control, you can set budgets at three levels:
Org Budget ($10/day)
|
+-- Team A Budget ($5/day)
| |
| +-- API Key 1 Budget ($2/day)
| +-- API Key 2 Budget ($3/day)
|
+-- Team B Budget ($5/day)
|
+-- API Key 3 Budget ($5/day)Budgets cascade. A request from API Key 1 checks against the key’s $2 limit, then Team A’s $5 limit, then the org’s $10 limit. If any level is exceeded, the request is denied.
Set team and key budgets via the dashboard under Settings > Budgets > Hierarchy, or via the API:
# Set a team budget
curl -X PUT https://api.curate-me.ai/api/v1/admin/budgets/team/team_abc \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"daily_budget": 5.00}'
# Set an API key budget
curl -X PUT https://api.curate-me.ai/api/v1/admin/budgets/key/key_xyz \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"daily_budget": 2.00}'Real-Time Cost Monitoring
Once your budget is set, the dashboard gives you real-time visibility into spend:
- Costs page: Daily and monthly spend breakdown by model, provider, and API key
- Activity feed: Every request logged with cost, tokens, latency, and governance decisions
- WebSocket updates: Real-time cost events pushed to the dashboard as requests complete
- Usage API: Programmatic access to cost data for your own dashboards and alerts
# Get today's cost summary
curl https://api.curate-me.ai/api/v1/admin/costs/today \
-H "Authorization: Bearer YOUR_JWT_TOKEN"Response:
{
"date": "2026-03-17",
"total_cost": 4.23,
"total_requests": 847,
"daily_budget": 10.00,
"budget_remaining": 5.77,
"by_model": {
"gpt-4o": 2.89,
"claude-sonnet-4-6": 1.12,
"gpt-4o-mini": 0.22
}
}Common Budget Configurations
Here are budget setups for common scenarios:
| Scenario | Daily Budget | Per-Request Limit | Rate Limit | Notes |
|---|---|---|---|---|
| Solo developer, side project | $5/day | $0.50 | 60 RPM | Tight controls for cost-conscious dev |
| Small team, production | $25/day | $2.00 | 200 RPM | Room for normal usage with safety margin |
| Growth stage, multiple agents | $100/day | $5.00 | 500 RPM | Higher limits with HITL on expensive requests |
| Enterprise, compliance-critical | $500/day | $10.00 | 1000 RPM | High throughput with full audit trail |
For all scenarios, we recommend enabling the HITL gate for requests above 50% of your per-request limit. This catches the edge cases — an agent that accidentally sends a 100K-token context window, or a loop that makes the same expensive call repeatedly.
Summary
Setting a budget cap takes four steps:
- Point your SDK at
https://api.curate-me.ai/v1/{provider} - Set
daily_budget: 10.00in your governance policy - Set
per_request_limit: 1.00to cap individual requests - Optionally configure webhook alerts at 80% spend
Every request is now governed. If your agents hit the budget, they get a clear 429 error instead of an unbounded bill. The audit trail records every request with its actual cost, and the dashboard gives you real-time visibility into spend.
No agent should run without a budget. Five minutes of setup saves you from the $500 surprise on your next invoice.
Set your first budget cap at dashboard.curate-me.ai . Free tier includes 1,000 requests/day with full governance.
Curate-Me is the governance layer for AI agents. Cost caps, PII scanning, rate limiting, HITL approvals, managed runners, and a full audit trail — zero code changes.