What face detection models do you use?

We use SCRFD (best speed/accuracy trade-off), RetinaFace (with keypoint detection), and YOLOv8 (flexible for custom tasks). The choice depends on latency and accuracy requirements.

How do you handle small face detection?

For small faces, we apply image tiling with overlapping slices (SAHI). This allows detecting faces as small as 16×16 pixels while maintaining high accuracy.

How long does it take to develop a face detector?

Timelines depend on complexity: standard detection with an off-the-shelf model — 1 week; custom conditions (masks, specific lighting) — 2–3 weeks; optimization for small faces and pipeline — 3–5 weeks.

Do I need to annotate data for training?

If your conditions differ from standard ones (camera angles, occlusions, masks), additional annotation may be required. We help organize data collection and annotation for your scenario.

What hardware is required for the detector to run?

For high-throughput systems (up to 100 FPS), a GPU (T4 or higher) is recommended. For embedded solutions, we use optimized models that run on CPU with latency as low as 45 ms.

What face detection models do you use?

We use SCRFD (best speed/accuracy trade-off), RetinaFace (with keypoint detection), and YOLOv8 (flexible for custom tasks). The choice depends on latency and accuracy requirements.

How do you handle small face detection?

For small faces, we apply image tiling with overlapping slices (SAHI). This allows detecting faces as small as 16×16 pixels while maintaining high accuracy.

How long does it take to develop a face detector?

Timelines depend on complexity: standard detection with an off-the-shelf model — 1 week; custom conditions (masks, specific lighting) — 2–3 weeks; optimization for small faces and pipeline — 3–5 weeks.

Do I need to annotate data for training?

If your conditions differ from standard ones (camera angles, occlusions, masks), additional annotation may be required. We help organize data collection and annotation for your scenario.

What hardware is required for the detector to run?

For high-throughput systems (up to 100 FPS), a GPU (T4 or higher) is recommended. For embedded solutions, we use optimized models that run on CPU with latency as low as 45 ms.

Building a Face Detection System for Production

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1566 services

Building a Face Detection System for Production

Simple

from 1 week to 3 months

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1317
Development of a web application for FEEDME
1226
Website development for BELFINGROUP
925
Development of an online store for the company FURNORO
1156
B2B Advance company logo design
620
Development of a web application for Enviok
894

Show more works

Building a Face Detection System for Production

Face detection is the first and critical step in any face pipeline. The task is to find all faces in an image and return bounding boxes with confidence scores. At first glance, it seems simple, but real-world conditions—small faces at a distance, profile angles, partial occlusions, poor lighting, masks—turn it into a non-trivial engineering challenge. In production systems, failing to detect even one face can critically degrade the entire pipeline's quality. For example, in stadium video surveillance, faces 20×20 pixels in size are the norm. Without specialized optimization, such objects are missed in 40% of frames. We develop detectors that work in production with latency as low as 4 ms on GPU while maintaining high accuracy even on complex scenes.

Why Standard Detectors Often Fail

Most open-source detectors are trained on datasets like WiderFace, where faces are well-lit and large. In reality, surveillance cameras, outdoor conditions, masks, and glasses reduce accuracy to 60–70%. We solve this by fine-tuning on target data with augmentations that simulate real conditions—rotations, shadows, blur. For example, adding synthetic masks during fine-tuning improves AP from 65% to 89% on the MAFA dataset.

How We Solve Face Detection

We use three main approaches depending on requirements.

SCRFD (Sample and Computation Redistribution for Face Detection) — currently the best speed/quality trade-off. SCRFD-10GF achieves 95.2% AP on WiderFace Hard, which is 2x faster than RetinaFace-R50 with comparable accuracy. More details can be found in the InsightFace repository.

RetinaFace — a classic with landmark detection (5 points: eyes, nose, mouth corners). Used for alignment before face recognition.

YOLOv8 fine-tuned on WiderFace — a versatile option for custom requirements.

from insightface.app import FaceAnalysis
import cv2

# InsightFace: detection + landmark detection
app = FaceAnalysis(allowed_modules=['detection'])
app.prepare(ctx_id=0, det_size=(640, 640))

def detect_faces(image_path: str) -> list[dict]:
    img = cv2.imread(image_path)
    faces = app.get(img)

    results = []
    for face in faces:
        results.append({
            'bbox': face.bbox.astype(int).tolist(),     # [x1, y1, x2, y2]
            'confidence': float(face.det_score),
            'landmarks': face.kps.astype(int).tolist()  # 5 keypoints
        })
    return results

Small Face Detection

Standard detectors miss faces smaller than 16×16 pixels. For surveillance cameras with large distances:

Image tiling: split the image into overlapping tiles, detect on each, merge results via NMS
SAHI (Slicing Aided Hyper Inference) — automatic tiling with merge. Library available on GitHub.

from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction

model = AutoDetectionModel.from_pretrained(
    model_type='yolov8',
    model_path='face_detector.pt',
    confidence_threshold=0.3
)

result = get_sliced_prediction(
    image='crowd.jpg',
    detection_model=model,
    slice_height=512,
    slice_width=512,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2
)

Performance on Different Hardware

Detector	WiderFace Hard AP	Latency CPU	Latency GPU (T4)
SCRFD-500MF	90.5%	8 ms	1.5 ms
SCRFD-10GF	95.2%	45 ms	4 ms
RetinaFace-R50	94.9%	90 ms	7 ms
YOLOv8n (WiderFace)	93.1%	12 ms	2 ms

How to choose a detector for your project?

If latency is critical (e.g., real-time video), the best choice is SCRFD-500MF on GPU. If maximum accuracy is needed, go with SCRFD-10GF. For embedded systems without GPU, YOLOv8n optimized via ONNX Runtime with INT8 quantization works well.

How to Fine-Tune a Model for Masked Face Detection?

The pandemic created a separate class of tasks—detecting faces with medical masks. The MAFA dataset contains 35,806 annotated masked faces. Fine-tuning a standard detector on MAFA+WiderFace: AP on masked faces improves from 65% to 89%. The fine-tuning process includes:

Collecting or generating synthetic data with masks
Augmentation: rotations, lighting changes, blur
Fine-tuning a pre-trained model on the mixed dataset
Validation on a separate test set

This ensures stable operation with masks, glasses, and other occlusions.

What's Included in Our Face Detection Service

We provide a turnkey solution, including:

Analysis of your conditions and preparation of synthetic/real data
Selection and fine-tuning of the detector (SCRFD/RetinaFace/YOLOv8)
Latency and memory footprint optimization (INT8 quantization, ONNX Runtime)
Integration into your pipeline (REST API, gRPC, RTSP)
Documentation and training for your team
Support during operation

With 5 years of experience in computer vision, we have completed over 30 face detection and recognition projects. We process up to 100 FPS on a single GPU. Results are guaranteed—if accuracy does not meet targets, we refine at no extra cost.

Development Timelines

Task	Timeline
Detection, standard conditions, ready model	1 week
Custom conditions (masks, cameras, lighting)	2–3 weeks
Small face detection, pipeline optimization	3–5 weeks

Request a demo version of the detector for your data and get a preliminary estimate within 1 day. Contact us to discuss your case.