The *current* reinforcement learning methods may not be improving reasoning capacity of the LLMs. Instead, they may be training the models to find the shortcuts more efficiently. limit-of-rlvr.github.io - ThreadSky

yyahn.bsky.social • 23 days ago

The *current* reinforcement learning methods may not be improving reasoning capacity of the LLMs. Instead, they may be training the models to find the shortcuts more efficiently.

https://limit-of-rlvr.github.io