Skip to Content
PlatformArchitecture

Architecture

This page covers the technical architecture of the Curate-Me platform, including the triple API design, gateway governance chain, managed runner control plane, streaming, memory, cost tracking, multi-tenancy, and the ConfigurableAgent pattern.

Triple API architecture

The backend runs as three separate FastAPI applications sharing the same codebase:

APIEntry pointPortPurpose
Gatewaymain_gateway.py8002AI gateway — reverse proxy with governance chain
B2Bmain_b2b.py8001Dashboard admin, agent management, runner console, costs
B2Cmain_b2c.py8000Consumer features (reference implementation)

All three applications share agents, services, and models but have different route registrations and middleware configurations. The Gateway handles LLM proxy traffic with API key auth. The B2B API includes tenant isolation middleware and JWT auth. The B2C API includes consumer-facing rate limiting and session auth.

# Start all three APIs in separate terminals poetry run uvicorn src.main_gateway:app --reload --port 8002 # AI Gateway poetry run uvicorn src.main_b2b:app --reload --port 8001 # B2B Dashboard API poetry run uvicorn src.main_b2c:app --reload --port 8000 # B2C Reference API

Gateway governance chain

The gateway intercepts every LLM request, applies a five-step governance policy chain, and proxies to the upstream provider. The chain short-circuits on the first denial.

Customer App --> Gateway :8002 --> Governance Chain --> Provider Router --> LLM Provider | | | Auth (API Key) Rate Limit Response passthrough (SSE) Org Context Cost Estimate | PII Scan Cost Recorder (Redis + MongoDB) Model Allowlist HITL Gate

Governance steps

StepModuleBehavior on failure
1. Rate limitgovernance.pyHTTP 429, retry-after header
2. Cost estimategovernance.pyHTTP 403, budget exceeded message
3. PII scangovernance.pyHTTP 400, identified PII types
4. Model allowlistgovernance.pyHTTP 403, model not permitted
5. HITL gategovernance.pyHTTP 202, request queued for approval

Provider routing

The gateway supports multiple LLM providers through a provider registry. Each organization can configure provider targets with their own API keys or use platform-managed keys.

Gateway --> Provider Router --> OpenAI --> Anthropic --> Google (Gemini) --> DeepSeek

The model_alias_registry.py maps model names to provider endpoints, enabling model aliasing (e.g., fast maps to gpt-4o-mini, smart maps to claude-sonnet-4).

The upstream_resilience.py module handles retry logic with exponential backoff when upstream providers return transient errors.

Managed runners

The runner control plane manages the full lifecycle of OpenClaw sandbox containers. It is the most differentiated feature in the platform — competitors like Portkey and Helicone do not offer managed execution environments.

Dashboard --> B2B API --> Runner Control Plane --> Provider (E2B / VPS) | State Machine: provisioning --> ready --> running --> stopped Immutable Audit Trail Security Policies (egress, sandbox levels)

Runner architecture

The control plane lives at services/backend/src/services/runner_control_plane/ and contains 65+ modules organized by concern:

LayerModulesPurpose
LifecycleState machine, provisioning, teardownContainer lifecycle management
SecuritySandbox levels, egress rules, network phasesIsolation and access control
ComputeResource allocation, idle suspend, snapshotsInfrastructure management
CIHeadless CI, auto-fix, worktreesContinuous integration inside runners
SkillsSkill gallery, MCP servers, hooks, subagentsExtensibility and tool profiles

Runner routes

Gateway runner routes (gateway_runner_*.py) expose 20+ route files for runner operations. Dashboard runner pages (24+ pages under /runners/*) provide the management UI.

Tool profiles

Each runner is configured with one of three tool profiles that control what the sandbox can access:

ProfileAccess level
MinimalRead-only filesystem, no network, no shell
StandardRead-write filesystem, allowlisted network, restricted shell
FullFull filesystem, full network, unrestricted shell

Streaming

Agent pipelines and gateway proxy responses use Server-Sent Events (SSE) for real-time streaming:

FastAPI SSE endpoint --> EventSource (browser) --> React state update

Each SSE message contains a serialized AgentEvent with a type discriminator (agent_start, token, agent_complete, etc.). The frontend processes these events incrementally to render progressive results.

For gateway proxy traffic, the response is streamed directly from the upstream provider through the gateway using httpx async streaming passthrough. The gateway records token counts and cost from the streamed response without buffering the full payload.

Memory system

The platform uses a three-tier memory architecture for agent personalization:

TierScopeLifetimePurpose
Profile MemoryUser-levelPersistentUser preferences, saved configurations, account-level settings
Pattern MemoryCross-sessionMedium-termRecurring behaviors, usage patterns, learned preferences
Session MemorySingle sessionEphemeralCurrent conversation context, recent interactions

Memory is managed by the user memory service and injected into agent prompts via the orchestrator. This enables personalization without requiring agents to manage state directly.

Cost tracking

Every LLM call is tracked in real time through a two-tier cost recording system:

  • Redis accumulator: In-memory running totals for fast budget checks during the governance chain
  • MongoDB audit log: Immutable per-request cost records for reporting and compliance

Tracked dimensions:

DimensionDescription
Per-callModel, token counts (input/output), computed cost, latency
Per-agentTotal spend and call count per agent over configurable windows
Per-orgDaily and monthly spend totals, budget utilization percentage
Per-keySpend attributed to individual API keys

Budget alerts trigger notifications when spend exceeds configurable thresholds. The B2B dashboard surfaces these metrics in the Cost Governance panel, and the health endpoint (/api/v1/health) reports daily spend.

Multi-tenancy

The B2B API uses TenantIsolationMiddleware for organization-based data isolation. Organization context is resolved from three sources in priority order:

  1. JWT claims: org_id and org_role fields in the token payload
  2. X-Org-ID header: Passed by the dashboard frontend client
  3. URL path parameter: Extracted from /organizations/{org_id}/... routes

The Gateway extracts organization context from the X-CM-API-Key — each API key maps to an org.

In route handlers, tenant context is available on the request state:

org_id = request.state.org_id user_id = request.state.user_id org_role = request.state.org_role

The dashboard frontend (apps/dashboard/lib/api.ts) automatically includes the X-Org-ID header in all requests when an organization context is active. All database queries in the B2B API are scoped to the resolved org_id, ensuring complete data isolation between tenants.

ConfigurableAgent pattern

All agents extend the ConfigurableAgent base class, which separates agent logic from domain-specific configuration:

from src.agents.base import BaseAgent from src.models.schemas import AgentEvent class MyAgent(BaseAgent): async def execute(self, input_data: dict) -> AsyncIterator[AgentEvent]: yield AgentEvent(type="agent_start", agent=self.name) # Agent logic here yield AgentEvent(type="agent_complete", result=result)

Configuration is injected at instantiation time:

  • Domain prompts: System and user prompt templates
  • Output schemas: Pydantic models defining the expected response structure
  • Model selection: Which LLM provider and model to use
  • Cost limits: Per-call and per-session budget caps

This pattern makes agents reusable across domains. The same agent code can serve different use cases by swapping its configuration — prompts, schemas, and model selection — without changing the agent implementation.