OpenAI had the answers to FrontierMath, which brings into question their o3 results

A lot of people think they didn’t actually train on the test set, although admit that there’s still plenty of contamination potential
Post image

Comments