Skip to Content
PlatformArchitecture

Architecture

This page covers the technical architecture of the Curate-Me platform, including the Gateway + B2B API design, gateway governance chain, managed runner control plane, streaming, memory, cost tracking, and multi-tenancy.

Gateway + B2B API architecture

The backend runs as two separate FastAPI applications sharing the same codebase:

APIEntry pointPortPurpose
Gatewaymain_gateway.py8002AI gateway — reverse proxy with governance chain
B2Bmain_b2b.py8001Dashboard admin, agent management, runner console, costs

Both applications share services and models but have different route registrations and middleware configurations. The Gateway handles LLM proxy traffic with API key auth. The B2B API includes tenant isolation middleware and JWT auth.

# Start both APIs in separate terminals poetry run uvicorn src.main_gateway:app --reload --port 8002 # AI Gateway poetry run uvicorn src.main_b2b:app --reload --port 8001 # B2B Dashboard API

Gateway governance chain

The gateway intercepts every LLM request, applies a 15-stage governance policy chain, and proxies allowed traffic to the upstream provider. The chain short-circuits on the first denial.

Customer App --> Gateway :8002 --> Governance Chain --> Provider Router --> LLM Provider | | | Auth (API Key) Plan + entitlement Response passthrough (SSE) Org Context Budget + safety | Provider policy Cost Recorder (Redis + MongoDB) Human review gates

Governance steps

StepModuleBehavior on failure
1. Plan enforcementgovernance.pyHTTP 403 when the account plan blocks the request
2. Body size limitgovernance.pyHTTP 413 when the request exceeds configured limits
3. Rate limitgovernance.pyHTTP 429 with retry-after guidance
4. Plan entitlementgovernance.pyHTTP 403 when the capability is not enabled
5. Reasoning token capgovernance.pyHTTP 400 or 403 when reasoning tokens exceed policy
6. Cost estimategovernance.pyHTTP 403 when projected spend exceeds budget
7. Hierarchical budgetgovernance.pyHTTP 403 when org, team, key, or user budget is exhausted
8. Runner session budgetgovernance.pyHTTP 403 when runner session spend is exhausted
9. PII scangovernance.pyHTTP 400 with identified PII types
10. Content safetygovernance.pyHTTP 400 or 403 when safety policy blocks content
11. Security scangovernance.pyHTTP 400 or 403 when prompt injection or exfiltration risk is detected
12. AI classifiergovernance.pyHTTP 403 when the classifier blocks risky requests
13. Model allowlistgovernance.pyHTTP 403 when the model is not permitted
14. Skill allowlistgovernance.pyHTTP 403 when the requested skill is not permitted
15. HITL gategovernance.pyHTTP 202 when the request is queued for approval

Provider routing

The gateway supports multiple LLM providers through a provider registry. Each organization can configure provider targets with their own API keys or use platform-managed keys.

Gateway --> Provider Router --> OpenAI --> Anthropic --> Google (Gemini) --> DeepSeek

The model_alias_registry.py maps model names to provider endpoints, enabling model aliasing (e.g., fast maps to gpt-4o-mini, smart maps to claude-sonnet-4).

The upstream_resilience.py module handles retry logic with exponential backoff when upstream providers return transient errors.

Managed runners

The runner control plane manages the full lifecycle of OpenClaw sandbox containers. It is the most differentiated feature in the platform — competitors like Portkey and Helicone do not offer managed execution environments.

Dashboard --> B2B API --> Runner Control Plane --> Provider (E2B / VPS) | State Machine: provisioning --> ready --> running --> stopped Immutable Audit Trail Security Policies (egress, sandbox levels)

Runner architecture

The control plane lives at services/backend/src/services/runner_control_plane/ and contains 65+ modules organized by concern:

LayerModulesPurpose
LifecycleState machine, provisioning, teardownContainer lifecycle management
SecuritySandbox levels, egress rules, network phasesIsolation and access control
ComputeResource allocation, idle suspend, snapshotsInfrastructure management
CIHeadless CI, auto-fix, worktreesContinuous integration inside runners
SkillsSkill gallery, MCP servers, hooks, subagentsExtensibility and tool profiles

Runner routes

Gateway runner routes (gateway_runner_*.py) expose 20+ route files for runner operations. Dashboard runner pages (24+ pages under /runners/*) provide the management UI.

Tool profiles

Each runner is configured with one of three tool profiles that control what the sandbox can access:

ProfileAccess level
MinimalRead-only filesystem, no network, no shell
StandardRead-write filesystem, allowlisted network, restricted shell
FullFull filesystem, full network, unrestricted shell

Streaming

Agent pipelines and gateway proxy responses use Server-Sent Events (SSE) for real-time streaming:

FastAPI SSE endpoint --> EventSource (browser) --> React state update

Each SSE message contains a serialized AgentEvent with a type discriminator (agent_start, token, agent_complete, etc.). The frontend processes these events incrementally to render progressive results.

For gateway proxy traffic, the response is streamed directly from the upstream provider through the gateway using httpx async streaming passthrough. The gateway records token counts and cost from the streamed response without buffering the full payload.

Memory system

The platform uses a three-tier memory architecture for agent personalization:

TierScopeLifetimePurpose
Profile MemoryUser-levelPersistentUser preferences, saved configurations, account-level settings
Pattern MemoryCross-sessionMedium-termRecurring behaviors, usage patterns, learned preferences
Session MemorySingle sessionEphemeralCurrent conversation context, recent interactions

Memory is managed by the user memory service and injected into agent prompts via the orchestrator. This enables personalization without requiring agents to manage state directly.

Cost tracking

Every LLM call is tracked in real time through a two-tier cost recording system:

  • Redis accumulator: In-memory running totals for fast budget checks during the governance chain
  • MongoDB audit log: Immutable per-request cost records for reporting and compliance

Tracked dimensions:

DimensionDescription
Per-callModel, token counts (input/output), computed cost, latency
Per-agentTotal spend and call count per agent over configurable windows
Per-orgDaily and monthly spend totals, budget utilization percentage
Per-keySpend attributed to individual API keys

Budget alerts trigger notifications when spend exceeds configurable thresholds. The B2B dashboard surfaces these metrics in the Cost Governance panel, and the health endpoint (/api/v1/health) reports daily spend.

Multi-tenancy

The B2B API uses TenantIsolationMiddleware for organization-based data isolation. Organization context is resolved from three sources in priority order:

  1. JWT claims: org_id and org_role fields in the token payload
  2. X-Org-ID header: Passed by the dashboard frontend client
  3. URL path parameter: Extracted from /organizations/{org_id}/... routes

The Gateway extracts organization context from the X-CM-API-Key — each API key maps to an org.

In route handlers, tenant context is available on the request state:

org_id = request.state.org_id user_id = request.state.user_id org_role = request.state.org_role

The dashboard frontend (apps/dashboard/lib/api.ts) automatically includes the X-Org-ID header in all requests when an organization context is active. All database queries in the B2B API are scoped to the resolved org_id, ensuring complete data isolation between tenants.