Exchange Data Normalization System Development
Each cryptocurrency exchange — a separate universe with its own naming conventions, number formats, time units, and field semantics. BTC/USDT on Binance is called BTCUSDT, on Kraken — XBT/USDT, on Bitfinex — tBTCUST. Normalization is a layer that hides this incompatibility behind a single interface.
What Needs to Be Normalized
Symbols and pairs. Each exchange has its own conventions. Normalized format — BASE/QUOTE in uppercase: BTC/USDT, ETH/BTC. Exchange symbols are stored in a mapping with possibility of reverse transformation.
Timestamps. Binance returns milliseconds, some exchanges — seconds, OKX — nanoseconds. Normalized format — milliseconds UTC, stored as int64.
Numbers. REST API often returns numbers as strings ("43250.50"), some exchanges lose trailing zeros. Normalized format — Decimal with explicit precision depending on the instrument.
Order sides. BUY/SELL, buy/sell, b/s, 1/-1 — all occur. Normalized format — enum BUY | SELL.
Order statuses. Each exchange has its own statuses. Normalized mapping:
| Exchange | Raw | Normalized |
|---|---|---|
| Binance | NEW, PARTIALLY_FILLED, FILLED, CANCELED |
OPEN, PARTIAL, FILLED, CANCELLED |
| Bybit | Created, New, PartiallyFilled, Filled |
OPEN, OPEN, PARTIAL, FILLED |
| OKX | live, partially_filled, filled, canceled |
OPEN, PARTIAL, FILLED, CANCELLED |
Normalizer Architecture
The normalizer is implemented as a set of exchange-specific adapters with a common interface:
from abc import ABC, abstractmethod
from decimal import Decimal
class ExchangeNormalizer(ABC):
@abstractmethod
def normalize_symbol(self, raw_symbol: str) -> str:
"""Converts exchange symbol to normalized format BASE/QUOTE"""
@abstractmethod
def normalize_ticker(self, raw_data: dict) -> NormalizedTicker:
"""Normalizes ticker data"""
@abstractmethod
def normalize_order(self, raw_data: dict) -> NormalizedOrder:
"""Normalizes order data"""
class BinanceNormalizer(ExchangeNormalizer):
SYMBOL_MAP = {
"BTCUSDT": "BTC/USDT",
"ETHUSDT": "ETH/USDT",
# ... from API /api/v3/exchangeInfo
}
def normalize_ticker(self, raw: dict) -> NormalizedTicker:
return NormalizedTicker(
exchange="binance",
symbol=self.normalize_symbol(raw["s"]),
timestamp=int(raw["T"]),
price=Decimal(raw["c"]),
volume_24h=Decimal(raw["v"]),
)
Dynamic Symbol Mapping Loading
Hard-coded symbol mapping in code is a bad idea: exchanges add new pairs daily. The correct approach — load mapping from Exchange Info API on startup and update periodically:
async def load_symbol_map(self):
exchange_info = await self.rest_client.get("/api/v3/exchangeInfo")
self.symbol_map = {
s["symbol"]: f"{s['baseAsset']}/{s['quoteAsset']}"
for s in exchange_info["symbols"]
if s["status"] == "TRADING"
}
# Inverted mapping for reverse transformation
self.reverse_map = {v: k for k, v in self.symbol_map.items()}
Normalized Data Validation
After normalization, it's important to validate the result. Negative prices, zero volumes, future timestamps — all are signs of data source problems:
def validate_ticker(ticker: NormalizedTicker) -> list[str]:
errors = []
if ticker.price <= 0:
errors.append(f"Invalid price: {ticker.price}")
if ticker.timestamp > now_ms() + 5000:
errors.append(f"Future timestamp: {ticker.timestamp}")
if ticker.bid and ticker.ask and ticker.bid >= ticker.ask:
errors.append(f"Crossed book: bid={ticker.bid} ask={ticker.ask}")
return errors
Invalid data is logged and discarded, not reaching downstream systems.
Normalizer Testing
Unit tests with real raw data examples from each exchange are mandatory. Exchanges sometimes change API format without warning. A set of fixed fixtures with expected normalized results allows quick detection of regression:
def test_binance_normalizer():
raw = {"s": "BTCUSDT", "c": "43250.50", "v": "28450.12", "T": 1704067200000}
result = BinanceNormalizer().normalize_ticker(raw)
assert result.symbol == "BTC/USDT"
assert result.price == Decimal("43250.50")
assert result.exchange == "binance"
Additionally — integration tests with live exchange API in sandbox mode, run daily in CI for early detection of API changes.







