Developing an SLA system for AI agents
An SLA (Service Level Agreement) for AI agents is a formal commitment to availability, speed, and quality. Unlike a standard SLA, response quality is a subjective metric that requires special measurement approaches.
Typical SLA metrics for AI agents
| Metrics | Standard SLA | AI-specific |
|---|---|---|
| Availability | > 99.5% | Including LLM provider availability |
| Response time (p95) | < 5s | Depends on the response length (tokens/s) |
| Error rate | < 1% | Including AI errors (hallucination, refusal) |
| Task completion | N/A | > 95% of tasks are completed successfully |
| Quality score | N/A | > 4.0/5.0 by LLM judge |
Real-time SLA monitoring
from dataclasses import dataclass
from datetime import datetime, timedelta
@dataclass
class SLADefinition:
name: str
metric: str
threshold: float
comparison: str # "gte" / "lte"
measurement_window: int # минуты
alerting_threshold: float # процент нарушений для алерта
SLA_SET = [
SLADefinition("availability", "uptime_pct", 99.5, "gte", 60, 0.1),
SLADefinition("p95_latency", "p95_latency_ms", 8000, "lte", 5, 0.05),
SLADefinition("task_success", "success_rate", 0.95, "gte", 60, 0.1),
SLADefinition("quality", "avg_quality_score", 4.0, "gte", 1440, 0.05),
]
class SLAMonitor:
def check_sla(self, agent_name: str) -> SLAStatus:
violations = []
for sla in SLA_SET:
current_value = self.metrics.get(agent_name, sla.metric, sla.measurement_window)
is_met = self._compare(current_value, sla.threshold, sla.comparison)
if not is_met:
violations.append(SLAViolation(
sla_name=sla.name,
expected=sla.threshold,
actual=current_value,
since=self.metrics.get_violation_start(agent_name, sla.name)
))
return SLAStatus(
agent_name=agent_name,
is_healthy=len(violations) == 0,
violations=violations,
checked_at=datetime.utcnow()
)
SLA Budgets (Error Budgets)
Google SRE approach: SLA 99.5% availability = 0.5% error budget. This can be spent on deployments and experiments. When exhausted, freeze changes.
class ErrorBudgetTracker:
def calculate(self, agent_name: str, period_days: int = 30) -> ErrorBudget:
sla_availability = 0.995 # 99.5%
total_minutes = period_days * 24 * 60
# Суммарное время недоступности за период
downtime_minutes = self.metrics.get_downtime(agent_name, days=period_days)
actual_availability = 1 - (downtime_minutes / total_minutes)
budget_minutes = total_minutes * (1 - sla_availability) # 216 минут за 30 дней
consumed_minutes = downtime_minutes
remaining_minutes = budget_minutes - consumed_minutes
remaining_pct = remaining_minutes / budget_minutes
return ErrorBudget(
total_budget_minutes=budget_minutes,
consumed_minutes=consumed_minutes,
remaining_minutes=remaining_minutes,
remaining_pct=remaining_pct,
is_exhausted=remaining_pct <= 0,
burn_rate=consumed_minutes / budget_minutes / (period_days / 30)
)
Reporting for clients
The monthly SLA report includes: actual values vs. SLA targets, violation times with causes, RCA (Root Cause Analysis) for incidents, and an action plan. A public status page for enterprise clients displays incident history and planned work.
Contractual penalties and credits
For enterprise SLAs with financial commitments: automatic credit calculation when the SLA is breached. For example: availability 99.0–99.5% → 5% credit, < 99.0% → 15% credit. The system automatically calculates and initiates credit notes through the billing system.







