OpenAI had the answers to FrontierMath, which brings into question their o3 results
A lot of people think they didn’t actually train on the test set, although admit that there’s still plenty of contamination potential
A lot of people think they didn’t actually train on the test set, although admit that there’s still plenty of contamination potential
Comments
despite the departures, openai has a ton of good researchers and people respect their work
…but Sam Altman’s style causes problems like this. openai feels too powerful to be the sole wielder of AGI/ASI