Architecture
This page covers the technical architecture of the Curate-Me platform, including the Gateway + B2B API design, gateway governance chain, managed runner control plane, streaming, memory, cost tracking, and multi-tenancy.
Gateway + B2B API architecture
The backend runs as two separate FastAPI applications sharing the same codebase:
| API | Entry point | Port | Purpose |
|---|---|---|---|
| Gateway | main_gateway.py | 8002 | AI gateway — reverse proxy with governance chain |
| B2B | main_b2b.py | 8001 | Dashboard admin, agent management, runner console, costs |
Both applications share services and models but have different route registrations and middleware configurations. The Gateway handles LLM proxy traffic with API key auth. The B2B API includes tenant isolation middleware and JWT auth.
# Start both APIs in separate terminals
poetry run uvicorn src.main_gateway:app --reload --port 8002 # AI Gateway
poetry run uvicorn src.main_b2b:app --reload --port 8001 # B2B Dashboard APIGateway governance chain
The gateway intercepts every LLM request, applies a 15-stage governance policy chain, and proxies allowed traffic to the upstream provider. The chain short-circuits on the first denial.
Customer App --> Gateway :8002 --> Governance Chain --> Provider Router --> LLM Provider
| | |
Auth (API Key) Plan + entitlement Response passthrough (SSE)
Org Context Budget + safety |
Provider policy Cost Recorder (Redis + MongoDB)
Human review gatesGovernance steps
| Step | Module | Behavior on failure |
|---|---|---|
| 1. Plan enforcement | governance.py | HTTP 403 when the account plan blocks the request |
| 2. Body size limit | governance.py | HTTP 413 when the request exceeds configured limits |
| 3. Rate limit | governance.py | HTTP 429 with retry-after guidance |
| 4. Plan entitlement | governance.py | HTTP 403 when the capability is not enabled |
| 5. Reasoning token cap | governance.py | HTTP 400 or 403 when reasoning tokens exceed policy |
| 6. Cost estimate | governance.py | HTTP 403 when projected spend exceeds budget |
| 7. Hierarchical budget | governance.py | HTTP 403 when org, team, key, or user budget is exhausted |
| 8. Runner session budget | governance.py | HTTP 403 when runner session spend is exhausted |
| 9. PII scan | governance.py | HTTP 400 with identified PII types |
| 10. Content safety | governance.py | HTTP 400 or 403 when safety policy blocks content |
| 11. Security scan | governance.py | HTTP 400 or 403 when prompt injection or exfiltration risk is detected |
| 12. AI classifier | governance.py | HTTP 403 when the classifier blocks risky requests |
| 13. Model allowlist | governance.py | HTTP 403 when the model is not permitted |
| 14. Skill allowlist | governance.py | HTTP 403 when the requested skill is not permitted |
| 15. HITL gate | governance.py | HTTP 202 when the request is queued for approval |
Provider routing
The gateway supports multiple LLM providers through a provider registry. Each organization can configure provider targets with their own API keys or use platform-managed keys.
Gateway --> Provider Router --> OpenAI
--> Anthropic
--> Google (Gemini)
--> DeepSeekThe model_alias_registry.py maps model names to provider endpoints, enabling model aliasing
(e.g., fast maps to gpt-4o-mini, smart maps to claude-sonnet-4).
The upstream_resilience.py module handles retry logic with exponential backoff when upstream
providers return transient errors.
Managed runners
The runner control plane manages the full lifecycle of OpenClaw sandbox containers. It is the most differentiated feature in the platform — competitors like Portkey and Helicone do not offer managed execution environments.
Dashboard --> B2B API --> Runner Control Plane --> Provider (E2B / VPS)
|
State Machine:
provisioning --> ready --> running --> stopped
Immutable Audit Trail
Security Policies (egress, sandbox levels)Runner architecture
The control plane lives at services/backend/src/services/runner_control_plane/ and contains
65+ modules organized by concern:
| Layer | Modules | Purpose |
|---|---|---|
| Lifecycle | State machine, provisioning, teardown | Container lifecycle management |
| Security | Sandbox levels, egress rules, network phases | Isolation and access control |
| Compute | Resource allocation, idle suspend, snapshots | Infrastructure management |
| CI | Headless CI, auto-fix, worktrees | Continuous integration inside runners |
| Skills | Skill gallery, MCP servers, hooks, subagents | Extensibility and tool profiles |
Runner routes
Gateway runner routes (gateway_runner_*.py) expose 20+ route files for runner operations.
Dashboard runner pages (24+ pages under /runners/*) provide the management UI.
Tool profiles
Each runner is configured with one of three tool profiles that control what the sandbox can access:
| Profile | Access level |
|---|---|
| Minimal | Read-only filesystem, no network, no shell |
| Standard | Read-write filesystem, allowlisted network, restricted shell |
| Full | Full filesystem, full network, unrestricted shell |
Streaming
Agent pipelines and gateway proxy responses use Server-Sent Events (SSE) for real-time streaming:
FastAPI SSE endpoint --> EventSource (browser) --> React state updateEach SSE message contains a serialized AgentEvent with a type discriminator (agent_start,
token, agent_complete, etc.). The frontend processes these events incrementally to render
progressive results.
For gateway proxy traffic, the response is streamed directly from the upstream provider through
the gateway using httpx async streaming passthrough. The gateway records token counts and cost
from the streamed response without buffering the full payload.
Memory system
The platform uses a three-tier memory architecture for agent personalization:
| Tier | Scope | Lifetime | Purpose |
|---|---|---|---|
| Profile Memory | User-level | Persistent | User preferences, saved configurations, account-level settings |
| Pattern Memory | Cross-session | Medium-term | Recurring behaviors, usage patterns, learned preferences |
| Session Memory | Single session | Ephemeral | Current conversation context, recent interactions |
Memory is managed by the user memory service and injected into agent prompts via the orchestrator. This enables personalization without requiring agents to manage state directly.
Cost tracking
Every LLM call is tracked in real time through a two-tier cost recording system:
- Redis accumulator: In-memory running totals for fast budget checks during the governance chain
- MongoDB audit log: Immutable per-request cost records for reporting and compliance
Tracked dimensions:
| Dimension | Description |
|---|---|
| Per-call | Model, token counts (input/output), computed cost, latency |
| Per-agent | Total spend and call count per agent over configurable windows |
| Per-org | Daily and monthly spend totals, budget utilization percentage |
| Per-key | Spend attributed to individual API keys |
Budget alerts trigger notifications when spend exceeds configurable thresholds. The B2B dashboard
surfaces these metrics in the Cost Governance panel, and the health endpoint (/api/v1/health)
reports daily spend.
Multi-tenancy
The B2B API uses TenantIsolationMiddleware for organization-based data isolation. Organization
context is resolved from three sources in priority order:
- JWT claims:
org_idandorg_rolefields in the token payload - X-Org-ID header: Passed by the dashboard frontend client
- URL path parameter: Extracted from
/organizations/{org_id}/...routes
The Gateway extracts organization context from the X-CM-API-Key — each API key maps to an org.
In route handlers, tenant context is available on the request state:
org_id = request.state.org_id
user_id = request.state.user_id
org_role = request.state.org_roleThe dashboard frontend (apps/dashboard/lib/api.ts) automatically includes the X-Org-ID header
in all requests when an organization context is active. All database queries in the B2B API are
scoped to the resolved org_id, ensuring complete data isolation between tenants.