WTF?! Both the new #ChatGPT "o1" reasoning model AND the $200/month "o1 pro" one from #OpenAI fail at #physics where o1-preview consistently succeeds - o1 ❌ chatgpt.com/share/675245... - o1-preview ✅ chatgpt.com/share/674360... - ThreadSky

geekycow.bsky.social • 169 days ago

WTF?!

Both the new #ChatGPT "o1" reasoning model AND the $200/month "o1 pro" one from #OpenAI fail at #physics where o1-preview consistently succeeds

- o1 ❌
https://chatgpt.com/share/67524546-4de8-8008-9942-0a32ec4ea41d
- o1-preview ✅
https://chatgpt.com/share/674360de-4678-8008-82d9-2f472abade08

1 / 2

Comments

ruenahcmohr.bsky.social•168 days ago

try asking "are you sure?" and see if it changes its mind. I have noticed it almost seems to intentionally get answers wrong the first time.

geekycow.bsky.social•168 days ago

It's a 50/50 question so that's kind of cheating really!

geekycow.bsky.social•169 days ago

Link to the $200 o1 Pro also failing https://chatgpt.com/share/675251c3-71d0-800c-bf93-7fdcc5f3d9e1

Oops.

Remember, o1-preview consistently got this right!

geekycow.bsky.social•169 days ago

I found the original [Action Lab Video](https://youtu.be/LBRBB6D8SdY?si=EUeAkx9ozvzTbIcN) that explains the problem and that inspired me to experiment to see how many clues it would take to get o1-preview to consistently get the right answers (two: both dice shape and directional entropic forces)

bornach.bsky.social•169 days ago

[AI Explained] used his own independent benchmark of human reasoning problems and also found that the o1pro performed slightly worse than o1-preview
https://youtu.be/AeMvOPkUwtQ?t=6m16s
It is as if the consensus approach hurt its performance -- the speculation is that $200 tier runs your query 10 times

geekycow.bsky.social•168 days ago

Oops.

Presumably o1 and o1 Pro have been optimized for benchmarks rather than real world performance?

And less than 24hrs after release people have spotted the Emperor is wearing no clothes then?

This is a pretty bad look for OpenAI.

bornach.bsky.social•168 days ago

But it is only Day 1

Day 12 will be AGI will be born and revealed to the world wrapped in swaddling clothes, right?

bornach.bsky.social•169 days ago

Perhaps you have to add zero gravity to your prompt
https://youtu.be/-P2rMk3bfkc
I jest of course. The Reddit discussions of this particular experiment are only a few months old and most probably after the training data cutoff period for the o1pro model

Comments

Posting Rules

Reply