We released the OLMo 2 report! Ready for some more RL curves? ๐Ÿ˜

This time, we applied RLVR iteratively! Our initial RLVR checkpoint on the RLVR dataset mix shows a low GSM8K score, so we did another RLVR on GSM8K only and another on MATH only ๐Ÿ˜†.

And it works! A thread ๐Ÿงต 1/N
Post image

Comments