Reinforcement-Learning Agents with Human-Like Risk Aversion

Category: Behavioral & Sentiment Finance • Article #13 • Reading time: 5 minutes

Introduction

RL trading agents trained purely on return maximization are too aggressive: they take excessive leverage and concentration risk. Real traders exhibit human-like risk aversion: avoiding large drawdowns even if returns are sacrificed. Incorporating human risk preferences into RL agent reward functions produces more realistic and stable trading agents.

Human Risk Preferences

Empirically, humans exhibit: (1) Loss aversion (2-2.5x more disutility from losses than gains); (2) Diminishing sensitivity (large swings worse than proportional-to-magnitude); (3) Portfolio-level risk consciousness (not just expected return). Model these preferences in reward function.

RL Agent Training

Train PPO agents with reward = Returns - 2.25 × Losses - 0.1 × Max_Drawdown - 0.05 × Concentration_Risk. Agents converge to strategies that balance returns and stability, avoiding excessive leverage. Compare to baseline (return-only) agents: human-risk agents have lower returns but better Sharpe ratios and drawdowns.

Conclusion

Incorporating human risk preferences into RL agents produces more stable, realistic trading behavior.