AI-Powered Video Content Moderation for Mobile Apps
User uploads video — you have seconds to decide whether to show it to others. Manual review doesn't scale. Queue video for 10 minutes and show later — lose user. Task: automatic video content classification right at upload or before publishing with minimal false positive rate.
Common Problem Areas
Real-Time vs Post-Processing Moderation
Most common architectural mistake — try to process video frame-by-frame on client. CoreML on iPhone 14 Pro handles MobileNet v3 at 30 fps on short clips, but kills battery and heats device. Android similar with MediaPipe: processing each frame in ImageAnalysis.Analyzer at 1080p causes ImageProxy backlogs and crashes with java.lang.IllegalStateException: Image is already closed.
Right approach for video — not frame-by-frame analysis, but selective: every N frames or key scenes via AVAssetImageGenerator (iOS) / MediaMetadataRetriever.getFrameAtTime() (Android). For most moderation tasks, 1 frame per second is enough.
Server-Side Moderation via Video Intelligence API
For apps with UGC video, build this scheme: client uploads video to storage (S3/GCS), triggers Cloud Function, which calls Google Video Intelligence API with EXPLICIT_CONTENT and OBJECT_TRACKING features. Response — JSON with timestamps and confidence scores per segment.
// Android: initiate upload and pass URI to backend
val uploadRef = storageRef.child("uploads/${UUID.randomUUID()}.mp4")
uploadRef.putFile(localUri)
.addOnSuccessListener { taskSnapshot ->
taskSnapshot.storage.downloadUrl.addOnSuccessListener { downloadUri ->
moderationApi.submitVideo(downloadUri.toString(), onComplete = { result ->
when (result.verdict) {
ModerationVerdict.SAFE -> publishVideo()
ModerationVerdict.UNSAFE -> rejectWithReason(result.reason)
ModerationVerdict.REVIEW -> sendToHumanReview()
}
})
}
}
AWS Rekognition Video — alternative with similar API: StartContentModeration + polling via GetContentModeration. For synchronous scenarios (short reels up to 30 sec), Rekognition Image applied to extracted frames — response in 200–400 ms.
On-Device Pre-Filtering
Before sending to server, run first and last frames through local CoreML / TFLite model. Catches obvious NSFW on client, saves traffic. Models like NudeNet Lite in TFLite format are ~14 MB with ~92% accuracy on NSFW datasets. False positives on medical content — separate story, requires whitelist logic at app category level.
How We Build Solutions
Stack depends on latency and budget requirements. For startups with small traffic — Google Video Intelligence API: pay only for processed minutes, no infrastructure to maintain. For high-load platforms — own inference service based on CLIP or custom ONNX model behind reverse proxy with cached hashes of already-checked videos (perceptual hashing via pHash prevents re-moderating same clip).
On client (iOS/Android/Flutter) implement:
- upload progress bar with
URLSession.uploadTask/okhttp3.MultipartBody - pending state for video in feed ("under review")
- push notification about result via FCM/APNs
Separate case — live streaming. Video Intelligence API doesn't fit due to latency. Use stream via WebRTC + server analysis of HLS segments every 2–4 seconds with speed-optimized model (MobileViT-S in TorchScript).
Process
Requirement audit: content type (UGC, Stories, live), acceptable publication delay, compliance requirements (GDPR, COPPA).
Stack selection: on-device pre-filter + cloud moderation vs fully server-side.
Development: upload SDK integration, webhook/polling for results, UI for statuses.
Testing on edge-case dataset: multilingual subtitles in frame, medical content, animation.
Timeline Guidance
Google Video Intelligence or AWS Rekognition Video integration — 3–5 days. Adding on-device pre-filter on CoreML/TFLite — 2–3 more days. Full solution with live streaming and human review system — 3–4 weeks.







