I have an awesome idea that no one had tried before - RL on math datasets 🤯
You will have a natural verifier!
You will have a natural verifier!
Comments