Architecture
This page covers the technical architecture of the Curate-Me platform, including the triple API design, gateway governance chain, managed runner control plane, streaming, memory, cost tracking, multi-tenancy, and the ConfigurableAgent pattern.
Triple API architecture
The backend runs as three separate FastAPI applications sharing the same codebase:
| API | Entry point | Port | Purpose |
|---|---|---|---|
| Gateway | main_gateway.py | 8002 | AI gateway — reverse proxy with governance chain |
| B2B | main_b2b.py | 8001 | Dashboard admin, agent management, runner console, costs |
| B2C | main_b2c.py | 8000 | Consumer features (reference implementation) |
All three applications share agents, services, and models but have different route registrations and middleware configurations. The Gateway handles LLM proxy traffic with API key auth. The B2B API includes tenant isolation middleware and JWT auth. The B2C API includes consumer-facing rate limiting and session auth.
# Start all three APIs in separate terminals
poetry run uvicorn src.main_gateway:app --reload --port 8002 # AI Gateway
poetry run uvicorn src.main_b2b:app --reload --port 8001 # B2B Dashboard API
poetry run uvicorn src.main_b2c:app --reload --port 8000 # B2C Reference APIGateway governance chain
The gateway intercepts every LLM request, applies a five-step governance policy chain, and proxies to the upstream provider. The chain short-circuits on the first denial.
Customer App --> Gateway :8002 --> Governance Chain --> Provider Router --> LLM Provider
| | |
Auth (API Key) Rate Limit Response passthrough (SSE)
Org Context Cost Estimate |
PII Scan Cost Recorder (Redis + MongoDB)
Model Allowlist
HITL GateGovernance steps
| Step | Module | Behavior on failure |
|---|---|---|
| 1. Rate limit | governance.py | HTTP 429, retry-after header |
| 2. Cost estimate | governance.py | HTTP 403, budget exceeded message |
| 3. PII scan | governance.py | HTTP 400, identified PII types |
| 4. Model allowlist | governance.py | HTTP 403, model not permitted |
| 5. HITL gate | governance.py | HTTP 202, request queued for approval |
Provider routing
The gateway supports multiple LLM providers through a provider registry. Each organization can configure provider targets with their own API keys or use platform-managed keys.
Gateway --> Provider Router --> OpenAI
--> Anthropic
--> Google (Gemini)
--> DeepSeekThe model_alias_registry.py maps model names to provider endpoints, enabling model aliasing
(e.g., fast maps to gpt-4o-mini, smart maps to claude-sonnet-4).
The upstream_resilience.py module handles retry logic with exponential backoff when upstream
providers return transient errors.
Managed runners
The runner control plane manages the full lifecycle of OpenClaw sandbox containers. It is the most differentiated feature in the platform — competitors like Portkey and Helicone do not offer managed execution environments.
Dashboard --> B2B API --> Runner Control Plane --> Provider (E2B / VPS)
|
State Machine:
provisioning --> ready --> running --> stopped
Immutable Audit Trail
Security Policies (egress, sandbox levels)Runner architecture
The control plane lives at services/backend/src/services/runner_control_plane/ and contains
65+ modules organized by concern:
| Layer | Modules | Purpose |
|---|---|---|
| Lifecycle | State machine, provisioning, teardown | Container lifecycle management |
| Security | Sandbox levels, egress rules, network phases | Isolation and access control |
| Compute | Resource allocation, idle suspend, snapshots | Infrastructure management |
| CI | Headless CI, auto-fix, worktrees | Continuous integration inside runners |
| Skills | Skill gallery, MCP servers, hooks, subagents | Extensibility and tool profiles |
Runner routes
Gateway runner routes (gateway_runner_*.py) expose 20+ route files for runner operations.
Dashboard runner pages (24+ pages under /runners/*) provide the management UI.
Tool profiles
Each runner is configured with one of three tool profiles that control what the sandbox can access:
| Profile | Access level |
|---|---|
| Minimal | Read-only filesystem, no network, no shell |
| Standard | Read-write filesystem, allowlisted network, restricted shell |
| Full | Full filesystem, full network, unrestricted shell |
Streaming
Agent pipelines and gateway proxy responses use Server-Sent Events (SSE) for real-time streaming:
FastAPI SSE endpoint --> EventSource (browser) --> React state updateEach SSE message contains a serialized AgentEvent with a type discriminator (agent_start,
token, agent_complete, etc.). The frontend processes these events incrementally to render
progressive results.
For gateway proxy traffic, the response is streamed directly from the upstream provider through
the gateway using httpx async streaming passthrough. The gateway records token counts and cost
from the streamed response without buffering the full payload.
Memory system
The platform uses a three-tier memory architecture for agent personalization:
| Tier | Scope | Lifetime | Purpose |
|---|---|---|---|
| Profile Memory | User-level | Persistent | User preferences, saved configurations, account-level settings |
| Pattern Memory | Cross-session | Medium-term | Recurring behaviors, usage patterns, learned preferences |
| Session Memory | Single session | Ephemeral | Current conversation context, recent interactions |
Memory is managed by the user memory service and injected into agent prompts via the orchestrator. This enables personalization without requiring agents to manage state directly.
Cost tracking
Every LLM call is tracked in real time through a two-tier cost recording system:
- Redis accumulator: In-memory running totals for fast budget checks during the governance chain
- MongoDB audit log: Immutable per-request cost records for reporting and compliance
Tracked dimensions:
| Dimension | Description |
|---|---|
| Per-call | Model, token counts (input/output), computed cost, latency |
| Per-agent | Total spend and call count per agent over configurable windows |
| Per-org | Daily and monthly spend totals, budget utilization percentage |
| Per-key | Spend attributed to individual API keys |
Budget alerts trigger notifications when spend exceeds configurable thresholds. The B2B dashboard
surfaces these metrics in the Cost Governance panel, and the health endpoint (/api/v1/health)
reports daily spend.
Multi-tenancy
The B2B API uses TenantIsolationMiddleware for organization-based data isolation. Organization
context is resolved from three sources in priority order:
- JWT claims:
org_idandorg_rolefields in the token payload - X-Org-ID header: Passed by the dashboard frontend client
- URL path parameter: Extracted from
/organizations/{org_id}/...routes
The Gateway extracts organization context from the X-CM-API-Key — each API key maps to an org.
In route handlers, tenant context is available on the request state:
org_id = request.state.org_id
user_id = request.state.user_id
org_role = request.state.org_roleThe dashboard frontend (apps/dashboard/lib/api.ts) automatically includes the X-Org-ID header
in all requests when an organization context is active. All database queries in the B2B API are
scoped to the resolved org_id, ensuring complete data isolation between tenants.
ConfigurableAgent pattern
All agents extend the ConfigurableAgent base class, which separates agent logic from
domain-specific configuration:
from src.agents.base import BaseAgent
from src.models.schemas import AgentEvent
class MyAgent(BaseAgent):
async def execute(self, input_data: dict) -> AsyncIterator[AgentEvent]:
yield AgentEvent(type="agent_start", agent=self.name)
# Agent logic here
yield AgentEvent(type="agent_complete", result=result)Configuration is injected at instantiation time:
- Domain prompts: System and user prompt templates
- Output schemas: Pydantic models defining the expected response structure
- Model selection: Which LLM provider and model to use
- Cost limits: Per-call and per-session budget caps
This pattern makes agents reusable across domains. The same agent code can serve different use cases by swapping its configuration — prompts, schemas, and model selection — without changing the agent implementation.