Introduction

Financial time series modeling requires testing stationarity—whether statistical properties (mean, variance) are constant over time. Non-stationary series (like prices) need differencing before analysis. Multiple tests exist: Augmented Dickey-Fuller (ADF), Kwiatkowski-Phillips-Schmidt-Shin (KPSS), Phillips-Perron (PP). This article explains these tests and when to use each.

Why Stationarity Matters

Many statistical models (ARIMA, regression) assume stationarity. If you regress non-stationary series, you can get spurious correlations. If you use non-stationary features in ML models, relationships estimated on training data may not hold in future periods (different trends, means, volatilities).

Financial prices (stocks, bonds, commodities) are typically non-stationary I(1) processes: prices themselves are non-stationary, but price changes (returns) are stationary.

Augmented Dickey-Fuller (ADF) Test

Tests null hypothesis: time series has unit root (non-stationary). ADF regresses first differences on lagged level and differences: Δy_t = γy_{t-1} + α + βt + error. Tests γ = 0 (unit root / non-stationary) vs γ < 0 (stationary).

Interpreting ADF Results

p-value < 0.05: reject null of unit root, conclude series is stationary. p-value > 0.05: fail to reject null, conclude series is non-stationary (or uncertain). Critical values depend on regression type (with/without trend).

Limitations

ADF has low power (difficulty detecting stationarity in moderately stationary series with slow mean reversion). Highly sensitive to lag selection (how many lagged differences include?). If you select wrong number of lags, test is unreliable.

KPSS Test

Tests opposite null: series is stationary. KPSS is complement to ADF: where ADF tests "is this non-stationary," KPSS tests "is this stationary." Null hypothesis: series is I(0) stationary around a constant or trend. Alternative: series has unit root.

Interpreting KPSS Results

p-value < 0.05: reject null of stationarity, conclude series is non-stationary. p-value > 0.05: fail to reject null, cannot conclude series is non-stationary.

Comparison to ADF

ADF and KPSS have opposite null hypotheses. Both can fail to reject (ambiguous), or both can reject (contradictory). When they disagree, reality is usually that series is borderline stationary/non-stationary (like mean-reverting commodity prices—long-term non-stationary, but short-term mean revert).

Phillips-Perron (PP) Test

Similar to ADF but handles heteroskedasticity (unequal variance over time) better. Uses non-parametric correction for autocorrelation rather than including lags explicitly. PP test is often more robust to various violations of test assumptions.

Interpretation is same as ADF: p-value < 0.05 suggests stationarity, p-value > 0.05 suggests non-stationarity.

Practical Application: When to Use Which?

Use ADF When

Series might have structural breaks or regime changes (ADF handles these better with trend terms). You have moderate sample size and expect clear stationarity or non-stationarity. You want the most widely used test (for comparability).

Use KPSS When

You want to test the positive claim "this is stationary" rather than "this might be stationary." Works better on smaller samples. You're unsure about series properties and want complementary test to ADF.

Use Phillips-Perron When

Your data has heteroskedasticity (non-constant variance). You want robustness to autocorrelation specification errors. You're analyzing financial data that often violates classical assumptions.

Practical Workflow

Step 1: Visualize the series. Does it trend? Oscillate around constant mean? Visual inspection often reveals stationarity faster than tests.

Step 2: Run ADF test. p-value < 0.05 usually indicates stationarity; > 0.05 usually indicates non-stationarity.

Step 3: If ADF is ambiguous (p-value 0.03-0.07), run KPSS and PP tests. If all three agree, confidence is high. If they disagree, series is borderline and you should inspect more carefully.

Step 4: If non-stationary, difference the series (compute returns instead of prices) and retest. First differences of I(1) series are I(0) stationary.

Common Pitfalls

Ignoring Lag Selection

ADF test results depend on lag selection. Too few lags: autocorrelation isn't captured, test is biased. Too many lags: power decreases (harder to detect stationarity). Use automatic lag selection (AIC, BIC) or sensitivity analysis (test with multiple lag lengths).

Assuming Stationarity When Uncertain

If tests are ambiguous, err on side of caution: treat as non-stationary, difference the series. The cost of false stationarity (spurious relationships) exceeds cost of false non-stationarity (differencing stationary series, losing some information).

Applying Tests to Prices, Not Returns

Stock prices are almost always non-stationary. This is expected. Test returns (price changes) instead, which should be stationary. Common mistake: testing prices, finding non-stationarity, assuming strategy won't work (incorrect—use returns, which are stationary).

Financial Example: Are Commodity Prices Stationary?

Commodity prices like oil, copper tend to be I(1) (non-stationary) over long periods due to trend-like behavior. But over medium horizons (years), they mean-revert (below cost of production, prices rise; above cost, prices fall).

ADF test might be non-stationary (p > 0.05) over 20-year period but stationary (p < 0.05) over 5-year period. This indicates long-term non-stationarity with mean reversion on shorter timescales—the worst case for stationarity tests.

Conclusion

Stationarity testing is important for time-series modeling and feature engineering. ADF, KPSS, and PP tests provide complementary information. ADF and KPSS have opposite nulls—running both is recommended. When tests disagree or are ambiguous, visual inspection and domain knowledge matter. For trading, default to assuming financial prices are non-stationary and using stationary returns, confirming with tests. Don't let statistical tests dictate economic reasoning.