Reinforcement-Learning Agents with Human-Like Risk Aversion
Introduction
RL trading agents trained purely on return maximization are too aggressive: they take excessive leverage and concentration risk. Real traders exhibit human-like risk aversion: avoiding large drawdowns even if returns are sacrificed. Incorporating human risk preferences into RL agent reward functions produces more realistic and stable trading agents.
Human Risk Preferences
Empirically, humans exhibit: (1) Loss aversion (2-2.5x more disutility from losses than gains); (2) Diminishing sensitivity (large swings worse than proportional-to-magnitude); (3) Portfolio-level risk consciousness (not just expected return). Model these preferences in reward function.
RL Agent Training
Train PPO agents with reward = Returns - 2.25 × Losses - 0.1 × Max_Drawdown - 0.05 × Concentration_Risk. Agents converge to strategies that balance returns and stability, avoiding excessive leverage. Compare to baseline (return-only) agents: human-risk agents have lower returns but better Sharpe ratios and drawdowns.
Conclusion
Incorporating human risk preferences into RL agents produces more stable, realistic trading behavior.