Reference Architecture: agents that act

A chat agent that only returns text needs little governance. An agent that acts — writes files, sends email, spends money, or reads private data — needs budgets, approvals, an audit trail, privacy controls, and safe execution. Curate-Me provides those as infrastructure, so you can let agents do real work without handing them the keys.

Family Manager — a consumer app built on Curate-Me — is the reference implementation of this layer. It forwards a family’s email to an assistant that drafts calendar events and to-dos, and nothing is written without a human approving it. Every control it relies on is the same control a team running acting agents would use.

This page is honest about maturity. Each capability is marked GA (enforced in production today), Beta (shipping, behind a flag), or Roadmap (planned, not a shipped control). We would rather under-claim.

The layers

Capability	What it does	What it proves for agents that act	Status
Governed gateway egress	Every model call passes through a 15-stage governance chain — cost estimate, PII scan, content-safety, security scan, model allowlist — and is fail-closed (a governance failure blocks the call, it does not pass through).	LLM egress is governed, not trusted.	GA
Human-in-the-loop approvals	High-impact actions become an approval. Decisions carry a single-use token and an optimistic version check, so no decision is double-applied and no stale decision wins. Approve-then-act: nothing commits without explicit approval.	No action commits without a human; no decision is replayed.	GA
Pre-egress minimization	Configured identifiers (names, emails, phones, addresses) are replaced with stable, reversible tokens before the model call; the mapping never leaves the platform and the model’s structured output is rehydrated afterward.	Identifiers are minimized before any third-party model sees them.	GA
Side-effect receipts	An action is only marked done when the target is read back and confirmed; a failed write surfaces as a loud failure, not a silent success.	Nothing is marked done on the agent’s word alone.	GA
Per-org budgets + cost attribution	Daily / monthly / per-request cost caps, plus per-org, per-key, per-model cost recorded on every call.	Spend is capped and attributed.	GA
Spend-gated approval	A request whose estimated cost crosses a threshold blocks for human approval in the request path (not after the fact).	Expensive runs gate for a human.	GA (gates by cost)
Eval + regression gates	Agent behavior is pinned by regression floors on a deterministic, frozen-clock corpus — a score that drops below a floor blocks the merge.	Behavior is tested, not hoped for.	GA (internal CI)
Retention + deletion	Raw inputs are purged on a schedule; members can delete their account, and a request-driven worker can tear down a full org while preserving legally-required billing/audit records.	Data has a lifecycle and an exit.	GA
Source preview + correction	See exactly what the agent read for a draft, correct it in natural language, and re-derive — without committing the wrong version.	Mistakes are correctable, not silent.	Beta
Managed runners / bring-your-own-machine	Run the agent on a sandboxed managed runner, or connect your own outbound-only machine so execution happens on infrastructure you control.	The action runs where you control it.	Beta
Per-tool approval (approve before a specific file write / shell / export)	Today this is enforced by the approval spine at the action layer (approve-then-act). A per-tool gate expressed as gateway policy is planned.	Fine-grained gates on individual tool calls.	Roadmap
Provider zero-retention routing	Today we send minimized payloads to provider APIs operating under their no-training default terms. Platform-enforced zero-retention routing is planned.	A hard, routed no-train guarantee.	Roadmap

How a request flows

Your app (or agent) points its existing LLM SDK at the gateway URL — zero code changes.
The governance chain runs: cost, PII/minimization, safety, security, model allowlist. Any denial short-circuits; the call is fail-closed.
If the action is high-impact or over a cost threshold, it becomes a human approval instead of proceeding.
On approval, the call proxies to the upstream provider; the cost is recorded per org/key/model.
If the agent took a side effect, a receipt confirms it by reading the target back.
Everything is visible per-tenant: traces, costs, approvals, and an audit trail.

See the Architecture page for the technical walkthrough of the gateway, governance chain, and runner control plane, and What data leaves your environment for the honest, surface-by-surface data-flow.

Not for everyone. If your team only does chat or prompt logging, this is more than you need. The value appears when agents touch repos, tools, spend, or private data.