Skip to Content
PlatformAgent Framework

Agent Framework

The platform provides a structured framework for building, orchestrating, and evaluating AI agents. This page covers the ConfigurableAgent base class, agent categories, event streaming, versioning, and prompt caching.

How agents work

Every agent in the platform extends the ConfigurableAgent base class (aliased as BaseAgent). Agents are stateless functions that receive input, call an LLM provider, and stream back a sequence of AgentEvent objects. The orchestrator manages execution order, parallelism, and result aggregation.

from src.agents.base import BaseAgent from src.models.schemas import AgentEvent from typing import AsyncIterator, Dict class MyAgent(BaseAgent): async def execute(self, input_data: Dict) -> AsyncIterator[AgentEvent]: yield AgentEvent(type="agent_start", agent=self.name) # Call the configured LLM provider result = await self.call_llm(input_data) yield AgentEvent(type="agent_complete", result=result)

AgentEvent streaming

Agents communicate through a typed event stream. Each event has a type discriminator that the orchestrator and frontend use to track progress:

Event typePurpose
agent_startAgent has begun execution
tokenIncremental token from streaming LLM response
agent_completeAgent has finished with a result
agent_errorAgent encountered an error
cost_updateReal-time cost information for the current call

Events are streamed over SSE to the frontend, enabling progressive rendering of results without waiting for the full pipeline to complete.

Configuration injection

Agents are domain-agnostic by design. All domain-specific behavior is injected through configuration at instantiation time:

ConfigurationPurpose
Domain promptsSystem and user prompt templates that define the agent’s behavior
Output schemasPydantic models that define the expected response structure
Model selectionWhich LLM provider and model to use (e.g., gpt-4o, claude-sonnet-4)
Cost limitsPer-call and per-session budget caps
Memory contextWhich memory tiers to inject into the prompt

This separation means a single agent implementation can serve entirely different use cases by swapping its configuration.

Agent categories

Core agents

Built-in agents that handle common tasks in any AI agent pipeline:

Agent typePurpose
AnalysisProcess and extract structured information from input data
GenerationProduce content, recommendations, or responses
TransformationConvert data between formats or enrich with additional context
ValidationCheck inputs or outputs against rules or schemas
ExplanationGenerate natural language summaries of complex results

Orchestration agents

Meta-agents that coordinate pipeline execution rather than performing domain work:

AgentPurpose
RouterDetermines which agents to invoke based on input type and intent classification
AggregatorMerges outputs from parallel agents into a unified response
Quality GateValidates agent outputs against quality criteria before returning to the caller
Fallback HandlerManages graceful degradation when agents fail or time out

The orchestrator resolves the dependency graph across agents and runs independent agents in parallel where possible. The Router agent is the entry point for requests that need dynamic agent selection.

Evaluation agents

Agents used for offline evaluation, benchmarking, and regression detection:

AgentPurpose
LLM JudgeScores agent outputs against configurable rubrics using an LLM evaluator
Regression DetectorCompares agent outputs across versions to identify quality changes
Consistency CheckerVerifies output stability across repeated runs with the same input

Evaluation agents are invoked through the dashboard’s evaluation framework or via the CLI. They produce structured scores and comparison reports that are stored alongside agent version metadata.

Custom agents

To build a custom agent:

  1. Create a new file in services/backend/src/agents/:
# services/backend/src/agents/my_custom_agent.py from src.agents.base import BaseAgent from src.models.schemas import AgentEvent class MyCustomAgent(BaseAgent): async def execute(self, input_data: dict): yield AgentEvent(type="agent_start", agent=self.name) # Prepare the prompt with injected configuration prompt = self.build_prompt(input_data) # Call the LLM with streaming async for token in self.stream_llm(prompt): yield AgentEvent(type="token", content=token) # Parse and validate the result result = self.parse_output(accumulated_response) yield AgentEvent(type="agent_complete", result=result)
  1. Register the agent in the orchestrator configuration.
  2. Configure domain prompts, output schemas, and model selection.
  3. Write tests in services/backend/tests/agents/.

Custom agents automatically inherit cost tracking, event streaming, error handling, and observability from the base class.

Agent versioning

Agents follow a semantic versioning strategy:

VersionMeaning
v1Initial production release
v2Major prompt or model change (may alter output format)
v2.2Minor prompt tuning (output format unchanged)
v3Architecture change (new model, new dependencies)

Version transitions are managed through the dashboard’s A/B testing framework. New versions run in shadow mode alongside the current production version, and promotion happens after evaluation metrics confirm parity or improvement.

Active versions are tracked in the agent configuration and surfaced in the dashboard’s Agent Management panel.

Prompt caching

The platform uses prompt caching to reduce costs on repeated system prompts. When a cache hit occurs, the cached portion of the prompt is billed at a 90% discount by supported providers. Prompt caching is especially effective for:

  • System prompts that remain static across requests
  • Agents that share a common preamble or instruction set
  • Evaluation agents running batched assessments
  • High-throughput pipelines where the same agent configuration handles many requests

Cache hit rates vary by agent but typically range from 70-95% for production traffic.

Cost per request

LLM costs are tracked per-call and aggregated per-agent. Actual costs depend on the models selected, prompt length, and response length. The gateway governance chain enforces per-request cost limits and per-org daily budgets to prevent runaway spend.

Cost data is available through:

  • Dashboard: Cost Governance panel with per-agent and per-org breakdowns
  • API: /api/v1/admin/costs endpoints for programmatic access
  • CLI: curate costs summary for command-line cost reports
  • Health endpoint: /api/v1/health reports daily spend totals