Face Recognition System Development
Face recognition is the task of identifying a person from a face image by matching against a database. Complete pipeline: face detection → alignment → embedding extraction → nearest neighbor search in database. Each stage affects overall accuracy, and early-stage errors are not compensated by subsequent stages.
Complete Pipeline
import cv2
import numpy as np
from insightface.app import FaceAnalysis
class FaceRecognitionSystem:
def __init__(self, db_path: str, threshold: float = 0.5):
# InsightFace combines detection + alignment + embedding
self.app = FaceAnalysis(
providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)
self.app.prepare(ctx_id=0, det_size=(640, 640))
self.threshold = threshold
self.face_db = self._load_database(db_path)
def identify(self, image: np.ndarray) -> list[dict]:
faces = self.app.get(image)
results = []
for face in faces:
embedding = face.embedding # 512-dim ArcFace embedding
match = self._search_database(embedding)
results.append({
'bbox': face.bbox.astype(int).tolist(),
'person_id': match['id'] if match else None,
'person_name': match['name'] if match else 'Unknown',
'similarity': match['similarity'] if match else 0.0,
'verified': match['similarity'] > self.threshold if match else False
})
return results
def _search_database(self, query_emb: np.ndarray) -> dict | None:
# Cosine similarity search
similarities = np.dot(self.face_db['embeddings'], query_emb) / (
np.linalg.norm(self.face_db['embeddings'], axis=1) *
np.linalg.norm(query_emb)
)
best_idx = np.argmax(similarities)
best_sim = similarities[best_idx]
if best_sim < self.threshold:
return None
return {
'id': self.face_db['ids'][best_idx],
'name': self.face_db['names'][best_idx],
'similarity': float(best_sim)
}
Embedding Extraction Models
ArcFace (InsightFace) — industry standard. LFW accuracy: 99.83%, IJB-C TAR@FAR=1e-4: 96.5%. Embedding size: 512 dimensions.
FaceNet (Google) — earlier model, still popular. LFW: 99.65%. Embedding size: 128 or 512 dimensions.
MagFace — enhanced ArcFace with scalable margin. IJB-C: 97.1%.
For edge devices: MobileFaceNet — 1MB, runs on mobile, LFW: 99.5%.
Face Database Scalability
For small database (< 10k faces) — brute-force cosine similarity works instantly. For large databases — approximate nearest neighbor (ANN):
import faiss
class FaceDatabase:
def __init__(self, dimension: int = 512):
# FAISS IVF index for million-scale databases
quantizer = faiss.IndexFlatIP(dimension) # Inner Product = cosine sim
self.index = faiss.IndexIVFFlat(quantizer, dimension, 100)
self.index.nprobe = 10 # quality vs search speed
def add_faces(self, embeddings: np.ndarray):
# Normalize for cosine similarity via IP
faiss.normalize_L2(embeddings)
if not self.index.is_trained:
self.index.train(embeddings)
self.index.add(embeddings)
def search(self, query: np.ndarray, k: int = 5):
faiss.normalize_L2(query.reshape(1, -1))
similarities, indices = self.index.search(query.reshape(1, -1), k)
return similarities[0], indices[0]
FAISS IVFFlat: search among 1M faces in < 1ms on CPU.
Handling Image Quality
Real-world systems work with blurry, partially occluded, poorly lit faces. Measures:
- Face quality score — before adding to database and before identification, evaluate crop quality (BRISQUE or specialized FaceQNet). Reject low-quality images
- Anti-spoofing — protection from photos and screens: MiniFASNet, CDCN. FAS (Face Anti-Spoofing) must be mandatory component for production systems
- 3D liveness detection — via IR camera or depth sensor (Face ID-like approach)
Legal and Ethical Aspects
Face recognition system requires legal compliance: GDPR in EU, national biometric laws. Biometric data — special category of personal data. Mandatory: explicit informed consent, embedding database encryption, access logging, right to deletion.
Development Stages
Requirements audit: 1:1 verification or 1:N identification, database scale, target hardware. Collect test dataset from real conditions (lighting, angles, cameras). Select and tune embedding model, anti-spoofing. Build database and tune similarity threshold. Integration, load testing, FAR/FRR monitoring.
| System Scale | Timeline |
|---|---|
| Verification (1:1), up to 1000 users | 3–4 weeks |
| Identification 1:N, up to 100k faces | 5–8 weeks |
| Enterprise system, 1M+ faces, multi-camera | 10–16 weeks |







