Skip to Content
BlogHow to Set a $10/Day Budget Cap for Your AI Agents

How to Set a $10/Day Budget Cap for Your AI Agents

Published March 17, 2026

An AI agent without a budget cap is a credit card without a limit. One stuck loop, one unexpectedly large context window, one misconfigured prompt — and your LLM bill can spike from dollars to hundreds of dollars in minutes.

This tutorial shows you how to set a $10/day budget cap using the Curate-Me gateway. The cap applies to all LLM calls from your agents, across all providers (OpenAI, Anthropic, Google, DeepSeek, and 50+ others). Setup takes under five minutes. No code changes required.

How It Works

The Curate-Me gateway sits between your agents and LLM providers. Every request passes through a governance chain — a 5-step policy pipeline that evaluates the request before it reaches the provider:

  1. Rate limiting — requests per minute per org/key
  2. Cost estimation — estimated cost vs. per-request and daily budget
  3. PII scanning — blocks secrets and PII before they reach providers
  4. Model allowlist — enforces which models can be used
  5. HITL gate — flags high-cost requests for human approval

The cost estimation step is where budget enforcement happens. Before proxying a request, the gateway estimates its cost based on the model, input tokens, and expected output tokens. It then checks:

  • Will this single request exceed the per-request cost limit?
  • Will this request push the org’s daily spend over the daily budget?

If either check fails, the request is denied with a 429 status code and a clear error message explaining why. The agent never reaches the LLM provider, so you are never charged.

The cost tracking itself uses Redis for real-time accumulation (sub-millisecond reads) and MongoDB for audit persistence. Every request’s actual cost is recorded after the response completes, including prompt tokens, completion tokens, reasoning tokens, and cache tokens. The daily accumulator resets at midnight UTC.

Prerequisites

  • A Curate-Me account (sign up free )
  • Your API key (cm_sk_xxx) from the dashboard
  • An existing application that makes LLM API calls

Step 1: Point Your SDK at the Gateway

Change your base URL to route through Curate-Me. This is a one-line change in your environment configuration:

# Before (direct to OpenAI): OPENAI_BASE_URL=https://api.openai.com/v1 # After (through Curate-Me gateway): OPENAI_BASE_URL=https://api.curate-me.ai/v1/openai

Add your Curate-Me API key as a header. Here is how it looks in code:

Python (OpenAI SDK)

from openai import OpenAI client = OpenAI( base_url="https://api.curate-me.ai/v1/openai", default_headers={"X-CM-API-Key": "cm_sk_xxx"}, ) # All calls now flow through the governance chain response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Summarize this document."}], )

TypeScript (OpenAI SDK)

import OpenAI from 'openai' const client = new OpenAI({ baseURL: 'https://api.curate-me.ai/v1/openai', defaultHeaders: { 'X-CM-API-Key': 'cm_sk_xxx' }, }) const response = await client.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: 'Summarize this document.' }], })

Anthropic, Google, DeepSeek

The gateway supports 50+ providers. Swap the provider segment in the URL:

# Anthropic ANTHROPIC_BASE_URL=https://api.curate-me.ai/v1/anthropic # Google (Gemini) GOOGLE_BASE_URL=https://api.curate-me.ai/v1/google # DeepSeek DEEPSEEK_BASE_URL=https://api.curate-me.ai/v1/deepseek

The same X-CM-API-Key header works for all providers. One key, one budget, all providers.

Step 2: Set the Daily Budget

Option A: Dashboard UI

  1. Open the Curate-Me dashboard 
  2. Navigate to Settings > Governance Policies
  3. Set Daily Budget to $10.00
  4. Set Per-Request Limit to $1.00 (recommended — prevents any single request from consuming the entire budget)
  5. Click Save

Option B: API / curl

curl -X PUT https://api.curate-me.ai/api/v1/admin/governance/policy \ -H "Authorization: Bearer YOUR_JWT_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "daily_budget": 10.00, "per_request_limit": 1.00, "rate_limit_rpm": 100 }'

Option C: Python SDK

from curate_me import CurateMe client = CurateMe(api_key="cm_sk_xxx") client.policies.update( daily_budget=10.00, per_request_limit=1.00, rate_limit_rpm=100, )

Option D: TypeScript SDK

import { CurateMe } from '@curate-me/sdk' const client = new CurateMe({ apiKey: 'cm_sk_xxx' }) await client.policies.update({ dailyBudget: 10.0, perRequestLimit: 1.0, rateLimitRpm: 100, })

Step 3: Verify It Works

Make a test request and confirm the governance chain is active:

curl https://api.curate-me.ai/v1/openai/chat/completions \ -H "Authorization: Bearer your-openai-key" \ -H "X-CM-API-Key: cm_sk_xxx" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "messages": [{"role": "user", "content": "ping"}] }'

If the response comes back normally, your requests are flowing through the governance chain. Check the dashboard’s Activity feed — you should see the request logged with its cost, model, and latency.

To test the budget enforcement, you can temporarily set a very low budget (e.g., $0.01) and send a few requests. Once the budget is exhausted, you will receive a 429 response:

{ "error": { "message": "Daily budget exceeded. Spent $0.012 of $0.010 daily budget.", "type": "rate_limit_error", "code": "daily_budget_exceeded" } }

Set the budget back to $10.00 after testing.

Step 4: Configure Alerts (Optional)

The gateway automatically fires a webhook alert when your org reaches 80% of its daily budget. You can configure a webhook URL to receive these alerts:

  1. Open the dashboard and navigate to Settings > Webhooks
  2. Add a webhook URL (Slack incoming webhook, PagerDuty, email relay, or any HTTP endpoint)
  3. Enable the budget.warning event type

The webhook payload includes your org ID, current daily spend, daily budget, and percentage used. This gives you time to investigate before the budget is fully exhausted.

What Gets Tracked

The cost recorder tracks more than just input and output tokens. Here is everything that factors into the cost calculation:

Token TypeDescription
Prompt tokensInput tokens in your request
Completion tokensOutput tokens in the response
Reasoning tokensTokens used by reasoning models (o1, o3)
Thinking tokensExtended thinking tokens (Claude)
Cache creation tokensTokens written to prompt cache
Cache read tokensTokens read from prompt cache (discounted)

All token types are recorded in the audit log with per-token-type costs. The daily accumulator sums the actual cost (not estimated) so your budget enforcement reflects real spend.

Budget Hierarchy: Org, Team, and Key Levels

The $10/day budget we set above applies at the org level. For more granular control, you can set budgets at three levels:

Org Budget ($10/day) | +-- Team A Budget ($5/day) | | | +-- API Key 1 Budget ($2/day) | +-- API Key 2 Budget ($3/day) | +-- Team B Budget ($5/day) | +-- API Key 3 Budget ($5/day)

Budgets cascade. A request from API Key 1 checks against the key’s $2 limit, then Team A’s $5 limit, then the org’s $10 limit. If any level is exceeded, the request is denied.

Set team and key budgets via the dashboard under Settings > Budgets > Hierarchy, or via the API:

# Set a team budget curl -X PUT https://api.curate-me.ai/api/v1/admin/budgets/team/team_abc \ -H "Authorization: Bearer YOUR_JWT_TOKEN" \ -H "Content-Type: application/json" \ -d '{"daily_budget": 5.00}' # Set an API key budget curl -X PUT https://api.curate-me.ai/api/v1/admin/budgets/key/key_xyz \ -H "Authorization: Bearer YOUR_JWT_TOKEN" \ -H "Content-Type: application/json" \ -d '{"daily_budget": 2.00}'

Real-Time Cost Monitoring

Once your budget is set, the dashboard gives you real-time visibility into spend:

  • Costs page: Daily and monthly spend breakdown by model, provider, and API key
  • Activity feed: Every request logged with cost, tokens, latency, and governance decisions
  • WebSocket updates: Real-time cost events pushed to the dashboard as requests complete
  • Usage API: Programmatic access to cost data for your own dashboards and alerts
# Get today's cost summary curl https://api.curate-me.ai/api/v1/admin/costs/today \ -H "Authorization: Bearer YOUR_JWT_TOKEN"

Response:

{ "date": "2026-03-17", "total_cost": 4.23, "total_requests": 847, "daily_budget": 10.00, "budget_remaining": 5.77, "by_model": { "gpt-4o": 2.89, "claude-sonnet-4-6": 1.12, "gpt-4o-mini": 0.22 } }

Common Budget Configurations

Here are budget setups for common scenarios:

ScenarioDaily BudgetPer-Request LimitRate LimitNotes
Solo developer, side project$5/day$0.5060 RPMTight controls for cost-conscious dev
Small team, production$25/day$2.00200 RPMRoom for normal usage with safety margin
Growth stage, multiple agents$100/day$5.00500 RPMHigher limits with HITL on expensive requests
Enterprise, compliance-critical$500/day$10.001000 RPMHigh throughput with full audit trail

For all scenarios, we recommend enabling the HITL gate for requests above 50% of your per-request limit. This catches the edge cases — an agent that accidentally sends a 100K-token context window, or a loop that makes the same expensive call repeatedly.

Summary

Setting a budget cap takes four steps:

  1. Point your SDK at https://api.curate-me.ai/v1/{provider}
  2. Set daily_budget: 10.00 in your governance policy
  3. Set per_request_limit: 1.00 to cap individual requests
  4. Optionally configure webhook alerts at 80% spend

Every request is now governed. If your agents hit the budget, they get a clear 429 error instead of an unbounded bill. The audit trail records every request with its actual cost, and the dashboard gives you real-time visibility into spend.

No agent should run without a budget. Five minutes of setup saves you from the $500 surprise on your next invoice.


Set your first budget cap at dashboard.curate-me.ai . Free tier includes 1,000 requests/day with full governance.

Curate-Me is the governance layer for AI agents. Cost caps, PII scanning, rate limiting, HITL approvals, managed runners, and a full audit trail — zero code changes.