DeepSeek's GRPO shifts the focus to groups—optimizing locally with relative baselines for enhanced stability, rather than treating policies individually or globally—this feels aligned.
Reminds me of random forests (RF) outperforming single trees.
https://buff.ly/42tVgtn #AI #ML
Reminds me of random forests (RF) outperforming single trees.
https://buff.ly/42tVgtn #AI #ML
Comments