Architecture

This page covers the technical architecture of the Curate-Me platform, including the Gateway + B2B API design, gateway governance chain, managed runner control plane, streaming, memory, cost tracking, and multi-tenancy.

Gateway + B2B API architecture

The backend runs as two separate FastAPI applications sharing the same codebase:

API	Entry point	Port	Purpose
Gateway	`main_gateway.py`	8002	AI gateway — reverse proxy with governance chain
B2B	`main_b2b.py`	8001	Dashboard admin, agent management, runner console, costs

Both applications share services and models but have different route registrations and middleware configurations. The Gateway handles LLM proxy traffic with API key auth. The B2B API includes tenant isolation middleware and JWT auth.


# Start both APIs in separate terminals
poetry run uvicorn src.main_gateway:app --reload --port 8002   # AI Gateway
poetry run uvicorn src.main_b2b:app --reload --port 8001       # B2B Dashboard API

Gateway governance chain

The gateway intercepts every LLM request, applies a 15-stage governance policy chain, and proxies allowed traffic to the upstream provider. The chain short-circuits on the first denial.


Customer App --> Gateway :8002 --> Governance Chain --> Provider Router --> LLM Provider
                    |                    |                                      |
               Auth (API Key)      Plan + entitlement       Response passthrough (SSE)
               Org Context         Budget + safety                     |
                                   Provider policy        Cost Recorder (Redis + MongoDB)
                                   Human review gates

Governance steps

Step	Module	Behavior on failure
1. Plan enforcement	`governance.py`	HTTP 403 when the account plan blocks the request
2. Body size limit	`governance.py`	HTTP 413 when the request exceeds configured limits
3. Rate limit	`governance.py`	HTTP 429 with retry-after guidance
4. Plan entitlement	`governance.py`	HTTP 403 when the capability is not enabled
5. Reasoning token cap	`governance.py`	HTTP 400 or 403 when reasoning tokens exceed policy
6. Cost estimate	`governance.py`	HTTP 403 when projected spend exceeds budget
7. Hierarchical budget	`governance.py`	HTTP 403 when org, team, key, or user budget is exhausted
8. Runner session budget	`governance.py`	HTTP 403 when runner session spend is exhausted
9. PII scan	`governance.py`	HTTP 400 with identified PII types
10. Content safety	`governance.py`	HTTP 400 or 403 when safety policy blocks content
11. Security scan	`governance.py`	HTTP 400 or 403 when prompt injection or exfiltration risk is detected
12. AI classifier	`governance.py`	HTTP 403 when the classifier blocks risky requests
13. Model allowlist	`governance.py`	HTTP 403 when the model is not permitted
14. Skill allowlist	`governance.py`	HTTP 403 when the requested skill is not permitted
15. HITL gate	`governance.py`	HTTP 202 when the request is queued for approval

Provider routing

The gateway supports multiple LLM providers through a provider registry. Each organization can configure provider targets with their own API keys or use platform-managed keys.


Gateway --> Provider Router --> OpenAI
                            --> Anthropic
                            --> Google (Gemini)
                            --> DeepSeek

The model_alias_registry.py maps model names to provider endpoints, enabling model aliasing (e.g., fast maps to gpt-4o-mini, smart maps to claude-sonnet-4).

The upstream_resilience.py module handles retry logic with exponential backoff when upstream providers return transient errors.

Managed runners

The runner control plane manages the full lifecycle of OpenClaw sandbox containers. It is the most differentiated feature in the platform — competitors like Portkey and Helicone do not offer managed execution environments.


Dashboard --> B2B API --> Runner Control Plane --> Provider (E2B / VPS)
                               |
                         State Machine:
                           provisioning --> ready --> running --> stopped
                         Immutable Audit Trail
                         Security Policies (egress, sandbox levels)

Runner architecture

The control plane lives at services/backend/src/services/runner_control_plane/ and contains 65+ modules organized by concern:

Layer	Modules	Purpose
Lifecycle	State machine, provisioning, teardown	Container lifecycle management
Security	Sandbox levels, egress rules, network phases	Isolation and access control
Compute	Resource allocation, idle suspend, snapshots	Infrastructure management
CI	Headless CI, auto-fix, worktrees	Continuous integration inside runners
Skills	Skill gallery, MCP servers, hooks, subagents	Extensibility and tool profiles

Runner routes

Gateway runner routes (gateway_runner_*.py) expose 20+ route files for runner operations. Dashboard runner pages (24+ pages under /runners/*) provide the management UI.

Tool profiles

Each runner is configured with one of three tool profiles that control what the sandbox can access:

Profile	Access level
Minimal	Read-only filesystem, no network, no shell
Standard	Read-write filesystem, allowlisted network, restricted shell
Full	Full filesystem, full network, unrestricted shell

Streaming

Agent pipelines and gateway proxy responses use Server-Sent Events (SSE) for real-time streaming:


FastAPI SSE endpoint --> EventSource (browser) --> React state update

Each SSE message contains a serialized AgentEvent with a type discriminator (agent_start, token, agent_complete, etc.). The frontend processes these events incrementally to render progressive results.

For gateway proxy traffic, the response is streamed directly from the upstream provider through the gateway using httpx async streaming passthrough. The gateway records token counts and cost from the streamed response without buffering the full payload.

Memory system

The platform uses a three-tier memory architecture for agent personalization:

Tier	Scope	Lifetime	Purpose
Profile Memory	User-level	Persistent	User preferences, saved configurations, account-level settings
Pattern Memory	Cross-session	Medium-term	Recurring behaviors, usage patterns, learned preferences
Session Memory	Single session	Ephemeral	Current conversation context, recent interactions

Memory is managed by the user memory service and injected into agent prompts via the orchestrator. This enables personalization without requiring agents to manage state directly.

Cost tracking

Every LLM call is tracked in real time through a two-tier cost recording system:

Redis accumulator: In-memory running totals for fast budget checks during the governance chain
MongoDB audit log: Immutable per-request cost records for reporting and compliance

Tracked dimensions:

Dimension	Description
Per-call	Model, token counts (input/output), computed cost, latency
Per-agent	Total spend and call count per agent over configurable windows
Per-org	Daily and monthly spend totals, budget utilization percentage
Per-key	Spend attributed to individual API keys

Budget alerts trigger notifications when spend exceeds configurable thresholds. The B2B dashboard surfaces these metrics in the Cost Governance panel, and the health endpoint (/api/v1/health) reports daily spend.

Multi-tenancy

The B2B API uses TenantIsolationMiddleware for organization-based data isolation. Organization context is resolved from three sources in priority order:

JWT claims: org_id and org_role fields in the token payload
X-Org-ID header: Passed by the dashboard frontend client
URL path parameter: Extracted from /organizations/{org_id}/... routes

The Gateway extracts organization context from the X-CM-API-Key — each API key maps to an org.

In route handlers, tenant context is available on the request state:


org_id = request.state.org_id
user_id = request.state.user_id
org_role = request.state.org_role

The dashboard frontend (apps/dashboard/lib/api.ts) automatically includes the X-Org-ID header in all requests when an organization context is active. All database queries in the B2B API are scoped to the resolved org_id, ensuring complete data isolation between tenants.