Statistical arbitrage algorithm development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
Statistical arbitrage algorithm development
Complex
from 2 weeks to 3 months
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1238
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1167
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    867
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1080
  • image_logo-advance_0.png
    B2B Advance company logo design
    563
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    829

Statistical Arbitrage Algorithm Development

Statistical arbitrage (stat arb) is trading on temporary deviations from historically stable statistical relationships between assets. Unlike pure arbitrage (risk-free profit), stat arb carries risk — the relationship may temporarily widen before returning. This risk is what creates profit opportunity.

Foundation: Cointegration and Mean-Reversion

Cointegration — statistical connection between two time series. Unlike correlation (connection of changes), cointegration means that a linear combination of two series is stationary. Simply put: assets may diverge, but in the long run they return to each other.

Engle-Granger cointegration test:

from statsmodels.tsa.stattools import coint

def find_cointegrated_pairs(prices_dict, p_threshold=0.05):
    symbols = list(prices_dict.keys())
    pairs = []
    
    for i, sym1 in enumerate(symbols):
        for sym2 in symbols[i+1:]:
            score, p_value, _ = coint(
                prices_dict[sym1], 
                prices_dict[sym2]
            )
            if p_value < p_threshold:
                pairs.append((sym1, sym2, p_value))
    
    return sorted(pairs, key=lambda x: x[2])

Good candidates in crypto: BTC/ETH, BTC-SPOT/BTC-PERP, similar Layer-1 tokens, ETH/LDO (staking derivative).

Model: Spread and Z-score

For cointegrated pair (X, Y) find hedge ratio β via OLS:

from sklearn.linear_model import LinearRegression

def calculate_hedge_ratio(price_x, price_y, window=60):
    # Rolling OLS for dynamic hedge ratio
    hedge_ratios = []
    for i in range(window, len(price_x)):
        x = price_x[i-window:i].values.reshape(-1, 1)
        y = price_y[i-window:i].values
        model = LinearRegression().fit(x, y)
        hedge_ratios.append(model.coef_[0])
    return hedge_ratios

Spread = Y - β × X

Z-score normalizes spread:

Z-score = (Spread - mean(Spread)) / std(Spread)

Trading signals:

  • Z-score > +2: spread abnormally wide → sell Y, buy X (long spread)
  • Z-score < -2: spread abnormally narrow → buy Y, sell X (short spread)
  • |Z-score| < 0.5: close position (reversion to mean)

Spread Risk Management

Stop-loss by Z-score: if Z-score widens to 3+ instead of narrowing — may signal structural shift. Exit position.

Half-life of mean reversion: estimate via AR(1) model:

from statsmodels.regression.linear_model import OLS

def calculate_half_life(spread):
    spread_lag = spread.shift(1).dropna()
    spread_diff = spread.diff().dropna()
    result = OLS(spread_diff, spread_lag).fit()
    half_life = -np.log(2) / result.params[0]
    return half_life

Half-life < 5 days — fast mean reversion, suitable for short-term trading. > 30 days — slow, requires longer positions.

Lookback window: period to calculate mean and std of spread. Too short — many false signals. Too long — slow reaction to changes. Optimized via walk-forward.

Kalman Filter for Dynamic Hedge Ratio

Static β becomes outdated. Kalman Filter adapts hedge ratio in real time:

from pykalman import KalmanFilter

kf = KalmanFilter(
    transition_matrices=[1],
    observation_matrices=price_x.values.reshape(-1, 1, 1),
    initial_state_mean=0,
    initial_state_covariance=1,
    observation_covariance=1,
    transition_covariance=0.05
)

state_means, state_covs = kf.filter(price_y.values)
hedge_ratio_dynamic = state_means.flatten()

Kalman Filter gives more stable signals and fewer false breakouts.

Multi-pair Stat Arb

Instead of pair trading — portfolio approach with multiple cointegrated pairs:

  • Diversification reduces risk of specific pair
  • Correlation between pairs should be minimal
  • PCA (Principal Component Analysis) to find common factors and build stationary portfolios

Eigenvector portfolio: from covariance matrix of N assets, extract stationary eigenvectors via PCA. Trade deviations from stationary state.

Execution and Transaction Costs

Stat arb is profitable only if returns exceed transaction costs:

  • Exchange fees (taker: 0.04–0.07%, maker: 0–0.02%)
  • Funding rate for perpetual positions
  • Slippage on execution
  • Borrowing cost for short positions

Minimum Z-score for entry adjusted for costs: if entry at Z=1.5 doesn't cover costs considering reversion probability — use Z=2.0.

Backtesting

Walk-forward validation: train on 6-12 months, test on next 1-2 months, repeat with shift.

Key metrics: Sharpe Ratio > 1.5, max drawdown < 15%, average position duration (does half-life match real data?), profitable vs unprofitable trade count.

Overfitting check: parameters optimized on one period should work on another. If parameters change significantly between periods — model is overfit.

Technical Stack

Python (pandas, numpy, statsmodels, sklearn), PostgreSQL for storing positions and P&L, CCXT for exchange API connection, Celery for scheduled tasks (minute-level spread and Z-score calculation), Grafana for monitoring. Deployed on AWS/GCP with co-location near exchange.