From an open-research point of view, maybe the greatest thing about DeepSeek–R1 is how its RL training technique appears so straightforward/simple in comparison to the cumbersome approaches we were starting to think necessary for reasoning like Process Reward Models or Monte Carlo Tree Search.
[1/2]

Comments