WTF?!
Both the new #ChatGPT "o1" reasoning model AND the $200/month "o1 pro" one from #OpenAI fail at #physics where o1-preview consistently succeeds
- o1 ❌
https://chatgpt.com/share/67524546-4de8-8008-9942-0a32ec4ea41d
- o1-preview ✅
https://chatgpt.com/share/674360de-4678-8008-82d9-2f472abade08
Both the new #ChatGPT "o1" reasoning model AND the $200/month "o1 pro" one from #OpenAI fail at #physics where o1-preview consistently succeeds
- o1 ❌
https://chatgpt.com/share/67524546-4de8-8008-9942-0a32ec4ea41d
- o1-preview ✅
https://chatgpt.com/share/674360de-4678-8008-82d9-2f472abade08
1 / 2
Comments
Oops.
Remember, o1-preview consistently got this right!
https://youtu.be/AeMvOPkUwtQ?t=6m16s
It is as if the consensus approach hurt its performance -- the speculation is that $200 tier runs your query 10 times
Presumably o1 and o1 Pro have been optimized for benchmarks rather than real world performance?
And less than 24hrs after release people have spotted the Emperor is wearing no clothes then?
This is a pretty bad look for OpenAI.
Day 12 will be AGI will be born and revealed to the world wrapped in swaddling clothes, right?
https://youtu.be/-P2rMk3bfkc
I jest of course. The Reddit discussions of this particular experiment are only a few months old and most probably after the training data cutoff period for the o1pro model