Continual Learning: Updating Models Without Re-training From Scratch

Category: Foundations & Core Concepts • Article #20 • Reading time: 5 minutes

Introduction

Markets change continuously. Models trained on 2020-2023 data might diverge from 2024 market behavior. Retraining from scratch on all historical data is expensive and slow. Continual learning approaches update models incrementally: learn from new data without forgetting old knowledge, and without computational cost of full retraining. This article explores continual learning techniques for trading.

Why Continual Learning Matters in Trading

Deploying a model, then waiting for monthly or quarterly retraining is reactive. By then, market regime might have shifted, model might be stale. Continual learning enables frequent updates (daily, hourly) without computational burden of full retraining. It's especially important for alternative data, which can decay quickly.

Online Learning Approaches

Stochastic Gradient Descent (SGD)

Instead of batch training (compute gradient on entire dataset, update), use online SGD (compute gradient on single sample or small batch, update). Applied repeatedly, SGD learns from all samples without storing all in memory.

Drawback: no need to re-visit old data, but can overfit to recent samples if not careful with learning rate scheduling.

Online Decision Trees

Hoeffding trees grow decision trees incrementally: observe single sample, decide whether to split or update node weights, continue. Requires minimal memory, provides online predictions.

Advantage: naturally handles streaming data. Disadvantage: usually worse performance than batch trees, less sophisticated split selection.

Incremental Learning Algorithms

Incremental Random Forests

Grow trees incrementally, updating ensemble as new data arrives. Partial Fit methods in sklearn enable training on batches without recomputing entire model.

Incremental Neural Networks

Train neural networks on batches of data, update weights without re-processing past data. Challenges: batch normalization, data shuffling, catastrophic forgetting (network overwrites old patterns when trained on new data).

Catastrophic Forgetting and Mitigation

When you train model on new data, it can forget patterns learned from old data. This is catastrophic forgetting: model performance on old patterns degrades. In trading, if you retrain on recent data exhibiting low volatility, model might lose ability to handle high-volatility regimes learned from historical data.

Mitigation Strategies

Rehearsal: periodically re-include old data in training batches. Mix old and new data equally during updates. Regularization: penalize large changes to weights, encouraging stability. Meta-learning: learn how to learn, adapting learning rate to importance of new patterns. Replay buffers: store samples from past, replay during training.

Incremental vs Stateful Models

Stateful Models

Models maintaining state across samples: time-series models (ARIMA, exponential smoothing) update state based on each new observation. ARIMA(p,d,q) state is last p residuals and differenced values; updating takes O(p) computation, not O(dataset size).

Stateless Models

Models without state (tree ensembles, neural networks) require storing all training data or incrementally updating. More complex to make fully online.

Practical Implementation for Trading

Hourly Updates with Online SGD

Every hour, observe new price data, compute new features, update model weights using SGD. Full retraining happens weekly or monthly. Between retrainings, online updates keep model fresh.

Rolling Window Retraining

Simplest practical approach: every day or week, retrain on rolling window of past 2-3 years. Old data drops off, new data is added. This isn't truly "continual" but approximates it: retraining is expensive but manageable on rolling window.

Hybrid Approach: Online + Periodic

Use online learning (SGD, incremental trees) for daily updates with current data. Periodically (monthly), do full retraining to incorporate long-term patterns and address concept drift. Balance responsiveness (online) with stability (periodic retraining).

Monitoring Model Drift

Continual learning must include monitoring: is model still performing well? Track:

Prediction accuracy on recent data
Feature importance changes (is model using same features or diverging?)
Distribution shifts (are input features drifting?)
Live vs backtest performance gap

If performance degrades significantly, trigger full retraining or model revision.

Class Imbalance and Distribution Shifts in Online Learning

Online learning is vulnerable to distribution shifts. If recent data has different class distribution (more winning trades, fewer losses), model adapts to new distribution, forgetting old patterns.

Mitigation: track distribution over time, reweight samples to maintain consistency, use class balancing techniques, monitor for shifts and trigger retraining when detected.

When NOT to Use Continual Learning

If model trains quickly on full dataset (seconds or minutes), periodic retraining is simpler and more robust. Continual learning adds complexity; only use if computational cost is prohibitive.

If market regime is stable and concept drift is minimal, continual learning adds little benefit. Quarterly retraining might suffice.

Conclusion

Continual learning enables efficient, responsive model updates in trading. Online learning (SGD, incremental trees) and incremental algorithms reduce computational cost while maintaining model freshness. Catastrophic forgetting is a concern; mitigation through rehearsal, regularization, and hybrid approaches helps. For high-frequency trading or rapid market changes, continual learning is essential. For slower-moving strategies with stable regimes, periodic retraining might suffice. The key is matching learning frequency to market change rate and computational constraints.