From an open-research point of view, maybe the greatest thing about DeepSeek–R1 is how its RL training technique appears so straightforward/simple in comparison to the cumbersome approaches we were starting to think necessary for reasoning like Process Reward Models or Monte Carlo Tree Search. [1/2] - ThreadSky

thomwolf.bsky.social • 20 days ago

From an open-research point of view, maybe the greatest thing about DeepSeek–R1 is how its RL training technique appears so straightforward/simple in comparison to the cumbersome approaches we were starting to think necessary for reasoning like Process Reward Models or Monte Carlo Tree Search.
[1/2]

Comments

Posting Rules

Comments

Posting Rules

Reply