DeepSeek's GRPO shifts the focus to groups—optimizing locally with relative baselines for enhanced stability, rather than treating policies individually or globally—this feels aligned. Reminds me of random forests (RF) outperforming single trees. https://buff.ly/42tVgtn #AI #ML - ThreadSky

zlatko-minev.bsky.social • 36 days ago

DeepSeek's GRPO shifts the focus to groups—optimizing locally with relative baselines for enhanced stability, rather than treating policies individually or globally—this feels aligned.
Reminds me of random forests (RF) outperforming single trees.
https://buff.ly/42tVgtn #AI #ML

Comments

Posting Rules

Comments

Posting Rules

Reply