Feature Store Design for Tick-Level Data
Introduction
ML models for high-frequency trading require real-time features computed from tick-level data (microsecond-resolution trades). Feature stores—systems managing feature computation, storage, and retrieval—must handle high throughput and low latency. Designing feature stores for tick data requires careful architectural choices.
Feature Store Architecture
Components: (1) Real-time feature computation (Flink/Spark Streaming computes features as ticks arrive); (2) Feature storage (in-memory cache for latency, persistent storage for audit); (3) Feature serving (ultra-fast lookups, microsecond latency). Separate online (low-latency serving) and offline (batch analysis) paths.
Feature Computation for Tick Data
Tick-level features: volatility (realized volatility of returns), spread (bid-ask spread), volume (cumulative volume), order book imbalance, momentum (recent price direction). Efficient computation uses rolling windows and incremental updates rather than full recalculation on each tick. Minimize redundant computation.
Storage and Serving
Online store: Redis or Memcached for sub-millisecond feature lookups. Batch: Parquet/HDF5 for efficient storage and retrieval of historical features for model training. Synchronization between online/offline ensures consistency and enables reliable backtest-to-production transitions.
Conclusion
Well-designed feature stores enable efficient real-time feature computation and serving for high-frequency ML trading models.