Implementing AI-Powered Support Agent Response Generation in Mobile Applications
A support agent answers the 80th ticket of the day. Text is boilerplate—"your request received, we're looking into it"—but needs typing each time or searching templates. AI generation doesn't replace agents; it eliminates busywork: draft ready in a second, agent edits and sends.
But implementing in agent mobile app (not client-facing) is harder: need a quick editor with predictions, LLM response streaming, sync with chat history.
Generating with dialog context
Main mistake: sending only the latest user message to LLM. Good answers need context: previous tickets, order status, customer plan.
Request to OpenAI with context:
// iOS
struct ResponseGenerationRequest: Encodable {
let model = "gpt-4o-mini"
let stream = true
let messages: [ChatMessage]
}
func buildMessages(ticket: Ticket, history: [Message], agentKnowledgeBase: String) -> [ChatMessage] {
var messages = [ChatMessage]()
messages.append(ChatMessage(
role: "system",
content: """
You are a support agent for \(companyName). Keep responses short, to the point, no filler.
Knowledge base:\n\(agentKnowledgeBase)
Customer order status: \(ticket.orderStatus ?? "no data")
"""
))
history.suffix(6).forEach { msg in
messages.append(ChatMessage(role: msg.role, content: msg.text))
}
messages.append(ChatMessage(role: "user", content: ticket.latestMessage))
return messages
}
suffix(6) takes last 6 messages, not entire history. Long context increases cost and response time; for most tickets, 3–4 messages suffice.
Response streaming: why it matters
Without streaming, agent waits 2–5 seconds for full LLM response. With stream: true, first words appear in 300–500 ms. Critical for mobile operator UX.
// Parse SSE stream
func streamResponse(for request: URLRequest) -> AsyncStream<String> {
AsyncStream { continuation in
let task = URLSession.shared.dataTask(with: request) { data, response, error in
// not for streaming
}
// Use URLSession.bytes for SSE
Task {
let (bytes, _) = try await URLSession.shared.bytes(for: request)
for try await line in bytes.lines {
guard line.hasPrefix("data: "),
let json = line.dropFirst(6).data(using: .utf8),
let chunk = try? JSONDecoder().decode(StreamChunk.self, from: json),
let text = chunk.choices.first?.delta.content
else { continue }
continuation.yield(text)
}
continuation.finish()
}
}
}
Android uses OkHttp with EventSourceListener from okhttp-sse library or parses responseBody.source() line-by-line.
Draft editor
Generated text is draft, not final. UI must have:
- Edit field opens immediately with text—agent sees what can be edited
- "Regenerate" button for new version on same topic
- "Adjust tone": more formal / neutral / empathetic—additional prompt suffix
- Edit distance counter vs original—track how agents modify AI output
// Android Compose
@Composable
fun ResponseEditor(
aiDraft: String,
onSend: (String) -> Unit,
onRegenerate: () -> Unit
) {
var editedText by remember { mutableStateOf(aiDraft) }
val editDistance = remember(editedText, aiDraft) {
levenshteinDistance(aiDraft, editedText) // custom utility
}
Column {
OutlinedTextField(
value = editedText,
onValueChange = { editedText = it },
modifier = Modifier.fillMaxWidth().heightIn(min = 120.dp)
)
Row {
Text("Edits: $editDistance chars", style = MaterialTheme.typography.labelSmall)
Spacer(Modifier.weight(1f))
TextButton(onClick = onRegenerate) { Text("Rewrite") }
Button(onClick = { onSend(editedText) }) { Text("Send") }
}
}
}
Edit distance counter isn't UI decoration. Log it to analytics: if agents edit > 50% of text, model is poorly tuned to knowledge base.
Knowledge base and RAG
For specific product questions, LLM hallucinates without context. Add RAG (Retrieval-Augmented Generation): before generating, vector search internal docs and insert relevant chunks into system prompt.
Backend: Pinecone, Weaviate, or pgvector (if PostgreSQL exists). Mobile client doesn't participate—just receives ready system prompt from server.
Timeline estimates
Basic generation without streaming via OpenAI—2–3 days. Full editor with streaming + tone adjustment + edit analytics—1.5–2 weeks. RAG backend integration—separate 1–2 weeks.







