Agent Framework
The platform provides a structured framework for building, orchestrating, and evaluating AI agents. This page covers the ConfigurableAgent base class, agent categories, event streaming, versioning, and prompt caching.
How agents work
Every agent in the platform extends the ConfigurableAgent base class (aliased as BaseAgent).
Agents are stateless functions that receive input, call an LLM provider, and stream back a sequence
of AgentEvent objects. The orchestrator manages execution order, parallelism, and result aggregation.
from src.agents.base import BaseAgent
from src.models.schemas import AgentEvent
from typing import AsyncIterator, Dict
class MyAgent(BaseAgent):
async def execute(self, input_data: Dict) -> AsyncIterator[AgentEvent]:
yield AgentEvent(type="agent_start", agent=self.name)
# Call the configured LLM provider
result = await self.call_llm(input_data)
yield AgentEvent(type="agent_complete", result=result)AgentEvent streaming
Agents communicate through a typed event stream. Each event has a type discriminator that
the orchestrator and frontend use to track progress:
| Event type | Purpose |
|---|---|
agent_start | Agent has begun execution |
token | Incremental token from streaming LLM response |
agent_complete | Agent has finished with a result |
agent_error | Agent encountered an error |
cost_update | Real-time cost information for the current call |
Events are streamed over SSE to the frontend, enabling progressive rendering of results without waiting for the full pipeline to complete.
Configuration injection
Agents are domain-agnostic by design. All domain-specific behavior is injected through configuration at instantiation time:
| Configuration | Purpose |
|---|---|
| Domain prompts | System and user prompt templates that define the agent’s behavior |
| Output schemas | Pydantic models that define the expected response structure |
| Model selection | Which LLM provider and model to use (e.g., gpt-4o, claude-sonnet-4) |
| Cost limits | Per-call and per-session budget caps |
| Memory context | Which memory tiers to inject into the prompt |
This separation means a single agent implementation can serve entirely different use cases by swapping its configuration.
Agent categories
Core agents
Built-in agents that handle common tasks in any AI agent pipeline:
| Agent type | Purpose |
|---|---|
| Analysis | Process and extract structured information from input data |
| Generation | Produce content, recommendations, or responses |
| Transformation | Convert data between formats or enrich with additional context |
| Validation | Check inputs or outputs against rules or schemas |
| Explanation | Generate natural language summaries of complex results |
Orchestration agents
Meta-agents that coordinate pipeline execution rather than performing domain work:
| Agent | Purpose |
|---|---|
| Router | Determines which agents to invoke based on input type and intent classification |
| Aggregator | Merges outputs from parallel agents into a unified response |
| Quality Gate | Validates agent outputs against quality criteria before returning to the caller |
| Fallback Handler | Manages graceful degradation when agents fail or time out |
The orchestrator resolves the dependency graph across agents and runs independent agents in parallel where possible. The Router agent is the entry point for requests that need dynamic agent selection.
Evaluation agents
Agents used for offline evaluation, benchmarking, and regression detection:
| Agent | Purpose |
|---|---|
| LLM Judge | Scores agent outputs against configurable rubrics using an LLM evaluator |
| Regression Detector | Compares agent outputs across versions to identify quality changes |
| Consistency Checker | Verifies output stability across repeated runs with the same input |
Evaluation agents are invoked through the dashboard’s evaluation framework or via the CLI. They produce structured scores and comparison reports that are stored alongside agent version metadata.
Custom agents
To build a custom agent:
- Create a new file in
services/backend/src/agents/:
# services/backend/src/agents/my_custom_agent.py
from src.agents.base import BaseAgent
from src.models.schemas import AgentEvent
class MyCustomAgent(BaseAgent):
async def execute(self, input_data: dict):
yield AgentEvent(type="agent_start", agent=self.name)
# Prepare the prompt with injected configuration
prompt = self.build_prompt(input_data)
# Call the LLM with streaming
async for token in self.stream_llm(prompt):
yield AgentEvent(type="token", content=token)
# Parse and validate the result
result = self.parse_output(accumulated_response)
yield AgentEvent(type="agent_complete", result=result)- Register the agent in the orchestrator configuration.
- Configure domain prompts, output schemas, and model selection.
- Write tests in
services/backend/tests/agents/.
Custom agents automatically inherit cost tracking, event streaming, error handling, and observability from the base class.
Agent versioning
Agents follow a semantic versioning strategy:
| Version | Meaning |
|---|---|
| v1 | Initial production release |
| v2 | Major prompt or model change (may alter output format) |
| v2.2 | Minor prompt tuning (output format unchanged) |
| v3 | Architecture change (new model, new dependencies) |
Version transitions are managed through the dashboard’s A/B testing framework. New versions run in shadow mode alongside the current production version, and promotion happens after evaluation metrics confirm parity or improvement.
Active versions are tracked in the agent configuration and surfaced in the dashboard’s Agent Management panel.
Prompt caching
The platform uses prompt caching to reduce costs on repeated system prompts. When a cache hit occurs, the cached portion of the prompt is billed at a 90% discount by supported providers. Prompt caching is especially effective for:
- System prompts that remain static across requests
- Agents that share a common preamble or instruction set
- Evaluation agents running batched assessments
- High-throughput pipelines where the same agent configuration handles many requests
Cache hit rates vary by agent but typically range from 70-95% for production traffic.
Cost per request
LLM costs are tracked per-call and aggregated per-agent. Actual costs depend on the models selected, prompt length, and response length. The gateway governance chain enforces per-request cost limits and per-org daily budgets to prevent runaway spend.
Cost data is available through:
- Dashboard: Cost Governance panel with per-agent and per-org breakdowns
- API:
/api/v1/admin/costsendpoints for programmatic access - CLI:
curate costs summaryfor command-line cost reports - Health endpoint:
/api/v1/healthreports daily spend totals