AI video quality enhancement in mobile app

BLACKSPARC.TECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

Development and support of all types of mobile applications:

Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1All 1735 services
AI video quality enhancement in mobile app
Complex
~2-4 weeks
Frequently Asked Questions

Our competencies:

Development stages

Latest works

  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    792
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    671
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1097
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    969
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    914
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    495

AI Video Quality Enhancement in Mobile Apps

Video quality enhancement is fundamentally different from photo enhancement. Photos can be processed over seconds: user will wait. Video requires either real-time (record/stream) or acceptable post-processing speed (roughly: 1-minute clip in 2–3 minutes). This dictates model choice and solution architecture.

Two Modes—Different Architectures

Post-processing recorded video—consume video file frame-by-frame via decoder, process with ML model, encode back. Speed matters, real-time not required.

Real-time enhancement during recording or playback—strict 33 ms per frame at 30 fps. Complex models won't fit. Need compromise quality.

Most client tasks are the first: user recorded old video or uploaded file, clicks "enhance".

Post-Processing on iOS: AVAssetReader + Metal + Core ML

// Read video as sequence of CVPixelBuffer
let asset = AVAsset(url: videoURL)
let reader = try AVAssetReader(asset: asset)
let output = AVAssetReaderTrackOutput(
    track: asset.tracks(withMediaType: .video).first!,
    outputSettings: [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange]
)
reader.add(output)
reader.startReading()

// Parallel writer for result
let writer = try AVAssetWriter(outputURL: resultURL, fileType: .mp4)
let writerInput = AVAssetWriterInput(mediaType: .video, outputSettings: videoSettings)
let adaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, ...)

Per frame: convert YUV → RGB (Metal shader), pass through Core ML model, convert RGB → YUV, write to AVAssetWriter. Conversion via Metal critical—CPU-based is 15–20 ms per frame just for color space.

Which model? For video denoising—RVRT or BasicVSR++ (temporal super resolution, accounting for neighboring frames). But they require frame batches and don't come for mobile originally. Compromise—per-frame models like Real-ESRGAN (ignore temporal consistency, simpler to deploy) or lightweight temporal models with 3–5 frame window.

Temporal flickering—main problem of per-frame approach: neighboring frames processed independently, details at boundaries "flicker". Solution—temporal consistency loss during fine-tuning or post-processing: averaging activations between neighboring frames with weight 0.1–0.2.

Android: MediaCodec + TFLite

// Use MediaCodec for decoding
val decoder = MediaCodec.createDecoderByType(format.getString(MediaFormat.KEY_MIME)!!)
decoder.configure(format, null, null, 0)

// Surface for decoding directly into OpenGL texture
val surface = Surface(surfaceTexture)
decoder.configure(format, surface, null, 0)
decoder.start()

On Android, decode to Surface → process via OpenGL/Vulkan compute → back via MediaCodec encoder. TFLite GPU Delegate works with OpenGL textures directly via setExternalContext(), eliminating one memory copy per frame.

For Full HD (1920×1080), tiled slicing 256×256 gives ~48 tiles. At 15 ms/tile inference—720 ms per frame. One-minute video (1800 frames) takes ~22 minutes. Acceptable for editor, not for quick-enhance. For quick-enhance—light model (2–4 MB TFLite), no tiling, downscale to 540p for inference.

Real-Time Denoising During Recording

If real-time needed—work with reduced resolution. Models trained on 360p/480p input deliver fast. On iPhone 14 Pro—ESRGAN x2 on 480p input ~28 ms via ANE. Android Snapdragon 8 Gen 2—comparable via GPU Delegate.

For CameraX with custom processing—ImageAnalysis use case with setBackpressureStrategy(STRATEGY_KEEP_ONLY_LATEST): don't queue if processing lags.

Don't Break Audio

Video pipeline processes video track only. Audio copied via AVAssetReaderTrackOutput / MediaExtractor unmodified. Typical mistake—forget to sync PTS (presentation timestamps) between audio and processed video track. AI processing doesn't change frame duration; copy PTS from original one-to-one.

Process

Analyze target scenarios (post-process vs real-time), pick model with measurements on target devices, implement decoding/encoding via AVFoundation/MediaCodec, ML pipeline with/without tiling, test edge cases—HDR video, non-standard resolutions, rotation.

Timeline Estimates

One-platform post-processing with per-frame model takes 3–5 weeks. Both platforms with temporal consistency and real-time mode requires 8–14 weeks.