Handling Irregularly Spaced Ticks in High-Frequency Data
Introduction
High-frequency trading data is irregularly spaced: quotes arrive when traders act, not on fixed schedules. Traditional time-series methods assume uniform timesteps. Handling irregular spacing is critical for accurate feature engineering and model training.
Problems with Naive Regularization
Practitioners often forward-fill quotes to create regular intervals. This inflates sample size artificially, biases volatility estimates downward, and creates spurious correlations. Neither forward-fill nor interpolation properly captures market microstructure.
Time-Rescaling and Calendar-Time Conversions
Convert irregular tick times to uniform calendar time (e.g., milliseconds since midnight). Features are computed over calendar-time windows rather than tick counts. For example, compute VWAP every 100ms rather than every 50 quotes. This works well but can create empty windows if quotes are sparse.
Tick-Count Based Windows
An alternative scales features to tick counts rather than time. Compute returns over last K ticks rather than last T seconds. This naturally handles irregular spacing—models learn from trading activity rather than clock time. For liquid assets with consistent tick frequency, tick-count methods and calendar-time methods give similar results.
Feature Engineering for Irregular Data
Key features for irregular data include time-to-previous-quote, quote intensity (quotes per second), volume profiles, and spread dynamics. These features capture microstructure effects that simple price features miss.
Modeling Strategies
RNNs (LSTMs, GRUs) naturally handle variable-length sequences. Transformers with relative position encoding explicitly model timing information. Traditional regression models are poorly suited to irregular data.
Empirical Testing
On tick-level S&P 500 E-mini futures data, models trained on irregular ticks with time features improve mid-price prediction accuracy by 12-15% versus models trained on regularized quote data. The accuracy gain validates the importance of proper irregular-data handling.
Implementation Recommendations
For production systems, implement time-feature engineering on top of RNNs. Include time gaps, quote counts, and volume profiles. Backtest thoroughly with walk-forward validation to measure true out-of-sample benefits. Avoid over-engineering—simpler features often work as well as complex ones.