Какие аномалии детектирует система?

Система обнаруживает объёмные аномалии трафика (DDoS), структурные (BGP hijack), деградацию KPI на элементах сети, дрейф метрик, многомерные аномалии (корреляция нескольких KPI), а также аномалии на уровне маршрутизации (route flapping, BGP leak).

Какой стек используется?

PyTorch, Prophet, Isolation Forest, LangChain для RCA, веб-интерфейс на React, развёртывание в Kubernetes с Triton Inference Server.

Сколько времени занимает внедрение?

Базовая система на Prophet и Isolation Forest — 3-4 недели. Полный цикл с BGP-аналитикой, графом корреляций и интеграцией в NOC — 2-3 месяца.

Нужны ли размеченные данные?

Нет, мы используем unsupervised подходы. Для Prophet нужна история KPI за 4 недели, для Isolation Forest — 30 дней нормальной работы. BGP-аналитика работает на сырых update-сообщениях.

Как обрабатываются ложные срабатывания?

Встроенный граф корреляций снижает шум в 5-10 раз. Пороги калибруются на нормальных данных (1% ложных срабатываний). Предусмотрен feedback loop для дообучения.

Какие аномалии детектирует система?

Система обнаруживает объёмные аномалии трафика (DDoS), структурные (BGP hijack), деградацию KPI на элементах сети, дрейф метрик, многомерные аномалии (корреляция нескольких KPI), а также аномалии на уровне маршрутизации (route flapping, BGP leak).

Какой стек используется?

PyTorch, Prophet, Isolation Forest, LangChain для RCA, веб-интерфейс на React, развёртывание в Kubernetes с Triton Inference Server.

Сколько времени занимает внедрение?

Базовая система на Prophet и Isolation Forest — 3-4 недели. Полный цикл с BGP-аналитикой, графом корреляций и интеграцией в NOC — 2-3 месяца.

Нужны ли размеченные данные?

Нет, мы используем unsupervised подходы. Для Prophet нужна история KPI за 4 недели, для Isolation Forest — 30 дней нормальной работы. BGP-аналитика работает на сырых update-сообщениях.

Как обрабатываются ложные срабатывания?

Встроенный граф корреляций снижает шум в 5-10 раз. Пороги калибруются на нормальных данных (1% ложных срабатываний). Предусмотрен feedback loop для дообучения.

AI Telecom Network Anomaly Detection System Development

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1566 services

AI Telecom Network Anomaly Detection System Development

Medium

~1-2 weeks

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1317
Development of a web application for FEEDME
1226
Website development for BELFINGROUP
925
Development of an online store for the company FURNORO
1156
B2B Advance company logo design
620
Development of a web application for Enviok
894

Show more works

Development of an AI System for Telecom Network Anomaly Detection

We develop ML-based anomaly detection systems for telecom networks, replacing brittle static thresholds with context-aware models that adapt to each network element's behavior. Our solutions include contextual KPI anomaly detection, multivariate Isolation Forest models per node, traffic pattern analysis, BGP anomaly detection, and alert correlation. We deliver production-ready systems integrated with NOC tools. Our team has 7+ years of ML experience and 20+ completed telecom projects. We guarantee quality and 24/7 support. Contact us to get a quote for your network scale.

A telecom network generates millions of metrics per minute. Traditional static thresholds fail to capture subtle anomalies. Slow drift, correlated KPI degradation, and atypical traffic patterns all go undetected. ML detection operates without preset thresholds. Models adapt to each element's normal behavior automatically. Our Isolation Forest models train per network element on 30 days of historical data. This means each router, switch, or base station gets its own calibrated anomaly threshold — not a one-size-fits-all rule. Detection precision reaches 90%+ with less than 1% false positive rate.

The system supports three anomaly layers: univariate time series (Prophet-based contextual detection), multivariate per-element patterns (Isolation Forest), and network-level events (traffic and BGP). Alert correlation using CMDB topology groups related alarms into single incidents, reducing NOC alert noise by 70–80%.

Multivariate anomaly at the network level

Limitations of static thresholds:

Normal router CPU at peak time = 75% (not anomaly)
CPU 50% at 3am on Saturday = anomaly (possible attack or memory leak)
Simultaneous degradation of 5 KPIs on one element = an anomaly, although each one separately is normal

Context-dependent thresholds:

import pandas as pd
import numpy as np
from prophet import Prophet

class ContextualAnomalyDetector:
    def __init__(self, kpi_name: str):
        self.kpi_name = kpi_name
        self.prophet_model = Prophet(
            daily_seasonality=True,
            weekly_seasonality=True,
            interval_width=0.99
        )
        self.fitted = False

    def fit(self, historical_data: pd.DataFrame):
        """
        historical_data: DataFrame с колонками ds (datetime), y (значение KPI)
        Минимум 4 недели истории для корректной сезонности.
        """
        self.prophet_model.fit(historical_data)
        self.fitted = True

    def detect(self, current_value: float, current_time: pd.Timestamp) -> dict:
        future = pd.DataFrame({'ds': [current_time]})
        forecast = self.prophet_model.predict(future)

        yhat = forecast['yhat'].values[0]
        yhat_lower = forecast['yhat_lower'].values[0]
        yhat_upper = forecast['yhat_upper'].values[0]

        is_anomaly = current_value < yhat_lower or current_value > yhat_upper
        deviation = (current_value - yhat) / (abs(yhat) + 1e-9)

        return {
            'kpi': self.kpi_name,
            'value': current_value,
            'expected': yhat,
            'bounds': (yhat_lower, yhat_upper),
            'anomaly': is_anomaly,
            'relative_deviation': deviation
        }

Multivariate Anomaly Detection

Isolation Forest on a node metric vector:

from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

class NetworkElementAnomalyDetector:
    """
    Каждый сетевой элемент имеет свою модель Isolation Forest.
    Обучение: 30 дней нормальной работы.
    Инференс: каждые 5 минут на текущем векторе KPI.
    """
    def __init__(self, element_id: str, contamination=0.01):
        self.element_id = element_id
        self.scaler = StandardScaler()
        self.model = IsolationForest(
            contamination=contamination,
            n_estimators=100,
            random_state=42
        )

    def fit(self, normal_kpi_matrix: np.ndarray):
        """
        normal_kpi_matrix: (N_samples × N_kpis)
        """
        X = self.scaler.fit_transform(normal_kpi_matrix)
        self.model.fit(X)
        # Калибровка порога на нормальных данных
        scores = self.model.score_samples(X)
        self.threshold = np.percentile(scores, 1)  # 1% ложных срабатываний

    def score(self, kpi_vector: np.ndarray) -> dict:
        X = self.scaler.transform([kpi_vector])
        raw_score = self.model.score_samples(X)[0]
        anomaly_score = -raw_score  # выше = аномальнее

        return {
            'element_id': self.element_id,
            'anomaly_score': float(anomaly_score),
            'is_anomaly': raw_score < self.threshold,
            'severity': self._score_to_severity(anomaly_score)
        }

    def _score_to_severity(self, score):
        if score > 0.7: return 'critical'
        if score > 0.5: return 'major'
        if score > 0.3: return 'minor'
        return 'normal'

Traffic Pattern Anomaly

Detection of atypical traffic (DDoS, BGP hijack):

def detect_traffic_anomaly(traffic_matrix: pd.DataFrame,
                            baseline_stats: dict) -> list:
    """
    traffic_matrix: src_ip × dst_ip × bytes за 5 минут (NetFlow/IPFIX)
    Аномалии трафика: объёмные (DDoS), структурные (BGP hijack), протокольные
    """
    anomalies = []

    # 1. Объёмная аномалия: резкий рост входящего трафика
    current_total = traffic_matrix['bytes'].sum()
    baseline_total = baseline_stats['total_bytes_mean']
    baseline_std = baseline_stats['total_bytes_std']

    volume_z_score = (current_total - baseline_total) / (baseline_std + 1e-9)
    if volume_z_score > 5:
        anomalies.append({
            'type': 'volumetric_spike',
            'severity': 'critical',
            'z_score': volume_z_score,
            'possible_cause': 'DDoS attack or flash crowd'
        })

    # 2. Новые источники: IP, которых не было в baseline
    current_sources = set(traffic_matrix['src_ip'].unique())
    known_sources = baseline_stats.get('known_src_ips', set())
    new_sources = current_sources - known_sources
    if len(new_sources) > baseline_stats.get('new_ip_threshold', 1000):
        anomalies.append({
            'type': 'new_source_flood',
            'severity': 'major',
            'new_ips_count': len(new_sources)
        })

    # 3. Протокольная аномалия: рост ICMP или UDP flood
    protocol_ratios = traffic_matrix.groupby('protocol')['bytes'].sum() / current_total
    for proto in ['ICMP', 'UDP']:
        if protocol_ratios.get(proto, 0) > 0.5:
            anomalies.append({
                'type': f'{proto}_flood',
                'severity': 'major',
                'ratio': protocol_ratios[proto]
            })

    return anomalies

BGP and routing

BGP anomaly detection:

def analyze_bgp_events(bgp_updates: pd.DataFrame, baseline_prefix_count: int) -> dict:
    """
    BGP hijack: внезапное появление нового AS-path для известного префикса.
    BGP leak: маршруты от одного провайдера рекламируются другому.
    Route flap: частые обновления = нестабильность соединения.
    """
    # Route flapping
    prefix_update_counts = bgp_updates.groupby('prefix').size()
    flapping_prefixes = prefix_update_counts[prefix_update_counts > 10].index.tolist()

    # Новые AS-origin для известных префиксов
    known_origins = {}  # prefix → expected AS
    hijack_candidates = []
    for _, row in bgp_updates.iterrows():
        if row['prefix'] in known_origins:
            if row['origin_as'] != known_origins[row['prefix']]:
                hijack_candidates.append({
                    'prefix': row['prefix'],
                    'expected_as': known_origins[row['prefix']],
                    'detected_as': row['origin_as']
                })

    return {
        'flapping_prefixes': flapping_prefixes,
        'hijack_candidates': hijack_candidates,
        'route_instability': len(flapping_prefixes) > 5
    }

Alert Correlation and Noise Suppression

Anomaly correlation graph: a router uplink failure results in hundreds of downstream anomalies. The algorithm builds a dependency graph from the CMDB topology, identifies the upstream root cause, and groups related events into a single incident for the NOC team.

What We Deliver

Our telecom anomaly detection platform covers the full detection stack. It includes Prophet-based contextual detection per KPI. Isolation Forest models run per network element. Traffic anomaly detection covers DDoS, BGP hijack, and protocol floods. BGP route stability monitoring is included. Alert correlation uses CMDB topology for root cause analysis. Integration covers Zabbix, Prometheus, PagerDuty, and ServiceNow.

We include data ingestion from SNMP, NETCONF, Kafka streams, and existing monitoring systems. The platform handles networks from a few hundred nodes up to tens of thousands of elements without architecture changes.

Use Cases and Configuration Checklist

This anomaly detection platform addresses key operational challenges. It detects CPU anomalies that occur only at non-peak hours. It identifies correlated KPI degradation across the same element without manual rule definition. DDoS traffic surges are flagged before they saturate uplinks. BGP route instability and hijack attempts are found in real time. NOC alert noise is reduced through automated root cause grouping.

Configuration checklist: define KPI list per element type (router, switch, base station, transport node). Load 30+ days of history for Prophet baseline training. Import CMDB topology for topological RCA. Configure NetFlow/IPFIX collectors. Set alert suppression rules for maintenance windows.

The system supports networks of any scale — from 100 elements in an enterprise core to 50,000+ nodes in a national telecom backbone. Isolation Forest models train per-element, so each network element gets its own anomaly threshold calibrated on its historical behavior.

Typical timelines: Prophet contextual anomaly detection + Isolation Forest per node + traffic anomaly — 3–4 weeks. Full platform with BGP anomaly detection, alert correlation graph, automatic root cause analysis, and NOC integration — 2–3 months.