Introduction

Trading models generate alpha; investors want to understand: what features drive alpha? Traditional feature importance (permutation feature importance, SHAP) measures correlation with returns but not causation. Causal inference methods (instrumental variables, causal forests) identify which features causally drive alpha, informing model improvements and risk management.

Causal vs Correlational Feature Importance

Correlational importance: features with high correlation to returns appear important, but may be proxies for omitted variables. Causal importance: only true causal drivers identified. Causal methods control for confounders, isolating true effects. Small causal effects may exist despite high correlation (confounding).

Causal Forest Methodology

Train causal forests (generalization of random forests) to estimate heterogeneous treatment effects: how does each feature causally affect returns? Causal forests measure conditional average treatment effects (CATE), varying across observations. Identify which features have largest causal impacts. Remove features with small causal effects to simplify models without harming performance.

Application

Identify truly important features; focus engineering and monitoring on those. Deprioritize features with high correlation but low causation. Improve model interpretability and robustness by removing spurious correlations.

Conclusion

Causal attribution of alpha clarifies true model drivers, enabling more robust and interpretable models.