AI-Powered Automatic Content Tagging for Mobile Apps
Auto-tagging assigns labels to content (photos, videos, documents, posts) automatically without user input. Implementation depends on content type: for images — Vision/ML Kit; for text — NLP models; for video — key frame analysis. All three scenarios occur in single apps.
Image Tagging On-Device
Standard combo: VNClassifyImageRequest (iOS) + ImageLabeler (Android). Return labels like "Food", "Sky", "Cat", "Outdoor". Sufficient for most apps.
Problem with built-in labels: they're English and generic. Apps with domain specificity (clothing marketplace, recipes, real estate) need custom taxonomy.
Custom Classification via CreateML:
// Training via CreateML (run on Mac, not device)
import CreateML
let trainingData = MLImageClassifier.DataSource.labeledDirectories(
at: URL(fileURLWithPath: "/training_data")
// Structure: /training_data/jacket/, /training_data/shoes/, /training_data/bag/
)
var params = MLImageClassifier.ModelParameters()
params.maxIterations = 25
params.validationData = .split(strategy: .automatic)
params.featureExtractor = .scenePrint(revision: 2) // Transfer learning from Apple
let model = try MLImageClassifier(trainingData: trainingData, parameters: params)
try model.write(to: URL(fileURLWithPath: "/model.mlmodel"), metadata: nil)
20–50 examples per category, 15–30 minutes training on MacBook Pro M2 — get .mlmodel for deployment. Core ML Model Deployment (Apple API) lets you update model without app update.
Text Content Tagging
For text posts, descriptions, comments — NLP classification. Options:
On-device via Create ML NLP Classifier:
let classifier = try NLModel(mlModel: textClassifierModel)
let labels = classifier.predictedLabelHypotheses(
for: "Отличный рецепт пасты с томатным соусом и базиликом",
maximumCount: 3
)
// Result: ["Food": 0.91, "Recipe": 0.78, "Italian": 0.45]
Model weighs 1–5 MB for 50–100 categories. Fully offline.
Server-side via OpenAI / Claude — for complex taxonomies or when high accuracy needed. Prompt: "Assign 3–5 tags from this list to the following text: [tag list]". Parse JSON response on client. Latency 0.5–2 seconds, suits user-generated content, not real-time feeds.
Hierarchical Tags
Simple tag list isn't enough. Need hierarchy: "Food → Italian Cuisine → Pasta". Implement via tag tree:
// Android: TagTree
data class Tag(
val id: String,
val name: String,
val parentId: String?,
val synonyms: List<String> = emptyList()
)
// When tagging: assign "Pasta" — automatically add parents
fun expandWithParents(tagId: String, tagTree: Map<String, Tag>): Set<String> {
val result = mutableSetOf(tagId)
var current = tagTree[tagId]
while (current?.parentId != null) {
current = tagTree[current.parentId]
current?.let { result.add(it.id) }
}
return result
}
User Tags vs Auto Tags
Auto tags are utility for search and filtering. User tags are visible and editable. Separate in database:
CREATE TABLE content_tags (
content_id UUID,
tag_id UUID,
source ENUM('auto', 'user', 'admin'),
confidence FLOAT, -- for auto: probability
created_at TIMESTAMP
);
Show only source = 'user' in UI. Use auto tags only for search/filters. User can "accept" auto tag, move it to user.
Video Tagging
Tag video by key frames — every 2–5 seconds grab frame, classify as image, aggregate tags:
func tagVideo(at url: URL) async throws -> Set<String> {
let asset = AVURLAsset(url: url)
let duration = asset.duration.seconds
let generator = AVAssetImageGenerator(asset: asset)
generator.maximumSize = CGSize(width: 224, height: 224)
var allTags = Set<String>()
var time = 0.0
while time < duration {
let cgImage = try generator.copyCGImage(at: CMTime(seconds: time, preferredTimescale: 600), actualTime: nil)
let frameTags = try await classifyImage(cgImage)
allTags.formUnion(frameTags)
time += 3.0 // every 3 seconds
}
return allTags
}
For long video (10+ minutes) — run via BackgroundTask or on backend.
Timelines
On-device image tagging with ready models — 3–5 days. Custom taxonomy with domain training, text and video tagging, hierarchical tags — 2–4 weeks. Cost calculated individually.







