Introduction

Raw price data is rarely sufficient for machine learning. A closing price by itself contains little predictive information. Effective trading models require carefully engineered features that capture relevant patterns and properties of price movements. Feature engineering in time-series finance is an art combining domain knowledge (understanding how markets work), statistical thinking (understanding what patterns matter), and practical experience (knowing which transformations actually improve model performance). This guide walks through the essential feature engineering techniques for price series.

Returns and Differencing: Moving from Prices to Changes

The first and most critical transformation is moving from prices to returns. Price levels are non-stationary (they wander over time), while returns approximate stationary processes. This matters because most statistical models assume stationarity, and many anomaly detection techniques break down on non-stationary data.

Returns are computed as: R(t) = (P(t) - P(t-1)) / P(t-1) for simple returns, or R(t) = log(P(t) / P(t-1)) for log returns. Log returns have nice properties: they're additive across time (returns over multiple periods compound correctly) and approximate simple returns for small price changes.

For more aggressive stationarity, you can difference the log returns: DR(t) = R(t) - R(t-1). This removes any trend in the returns themselves and can help with particularly non-stationary series (like volatility). However, excessive differencing removes too much information—know when to stop.

Lags and Autoregressive Features

Time-series models fundamentally rely on the assumption that past values inform future values. Lags capture this relationship explicitly. A lag-1 feature is yesterday's return; lag-2 is two days ago; lag-5 is five days ago.

Creating an effective lag structure requires judgment: which lags matter? For daily equity data, lags 1-5 (one week) are often informative due to mean reversion. For intraday data, lags might span minutes. For weekly macro data, lags might span months.

A common mistake is including too many lags. Including lags 1-60 doesn't improve models—it adds noise and computation without signal. Better to select lags based on autocorrelation analysis: plot the autocorrelation function (ACF) and identify which lags show significant correlation with the target, then include only those.

Implementation consideration: lag features introduce look-ahead bias if not careful. If your target is "tomorrow's return," lags should be yesterday's return and earlier—not including today's return in the feature set.

Rolling Statistics: Windows of Time

Prices and returns are non-stationary, but statistics over windows are often more stable. Rolling mean, rolling volatility, rolling correlation capture market microstructure in dynamic ways.

Rolling volatility (standard deviation over a window, typically 20-60 trading days) is remarkably informative. Low volatility regimes and high volatility regimes behave differently. Mean-reversion works better in high-vol regimes; momentum works better in low-vol. Including rolling volatility helps models adapt to regime changes.

Rolling correlation between an asset and market factors (beta) changes over time and influences optimal trading behavior. Rolling skewness and kurtosis capture tail risk properties that vary with market regime.

Window length selection is crucial. Too short (5 days) and the statistic is noisy. Too long (252 days) and you lose sensitivity to regime changes. 20-60 days is typically optimal for equity daily data. Intraday trading might use windows of minutes or hours.

Momentum and Rate of Change Indicators

Momentum features capture trends in prices or returns. Simple momentum is the return over some period: price(t) / price(t-n). Momentum has a non-obvious property: it's actually mean-reverting at long horizons (6+ months) but momentum at short horizons (2-12 weeks) but mean-reverting at very short horizons (days). This means the same underlying concept changes sign depending on the horizon.

Rate of change (price today vs 10 days ago) is similar to momentum. Both work better when combined with other features. Pure momentum alone frequently fails out-of-sample, likely due to regime dependence and changing market structure.

Relative momentum (comparing an asset's momentum to its peers, or to the broader market) often works better than absolute momentum. It's more stable across market regimes because it captures relative strength rather than absolute trends.

Mean Reversion and Relative Value Features

Mean reversion is particularly strong in mean-reversion strategies. Z-score features quantify how far from mean: (Price - MA(n)) / STD(n). Stocks trading at high z-scores (significantly above mean) tend to mean-revert downward; low z-scores tend to revert upward.

Residuals from relationships are similar. If a stock should move with its sector (regression-based prediction), the difference between actual and predicted is the residual. Residuals that are high-magnitude tend to mean-revert—the stock catches up to where fundamental relationships suggest it should be.

Spread features between related assets (stock vs index, futures vs spot, pairs) are powerful because they quantify relative value. A pairs trading system might trade based on whether the spread is wider or narrower than its long-term median.

Cross-Sectional Features

For systems trading multiple assets simultaneously, ranking and relative features matter. What matters isn't a stock's absolute volatility, but its volatility relative to peers. Not whether it's going up, but whether it's outperforming the sector.

Percentile rank features (what percentile is this stock's return in among all stocks?) are informative and interpretable. Z-scores across the cross-section (how many standard deviations above the market mean?) standardize information across different market conditions.

Feature Interactions and Multiplicative Effects

Sometimes features aren't additive. A trading signal might only work when volatility is low. A momentum signal might only work in uptrends. Capturing these interactions requires explicit feature engineering: create features that combine conditions (momentum × sign(trend), for example).

Be cautious with interactions: they multiply the number of features and increase overfitting risk. Use theoretical justification or empirical correlation analysis to decide which interactions are worth including.

Normalization and Scaling

Before feeding features into machine learning models, normalize them. Raw returns might range from -5% to +5%, while volatility ranges from 10% to 60%, and momentum spans 0.9 to 1.1. Scaling to a consistent range (typically 0-1 or standardized with mean 0 and std 1) improves model training and convergence, particularly for gradient-based algorithms.

Conclusion

Feature engineering for price series isn't a one-size-fits-all problem. It requires understanding what aspects of price behavior matter for your specific trading objective, then engineering features that capture those aspects. Returns capture price changes while removing non-stationarity. Lags capture autoregressive patterns. Rolling statistics capture regime changes. Momentum captures trends. Mean reversion captures range-bound behavior. Relative value captures relative strength. The most effective feature sets combine multiple concepts, each capturing different aspects of market dynamics. Investment in thoughtful feature engineering usually pays better returns than effort spent tuning models on poorly engineered features.