AI hype is making AI researchers forget painfully learned lessons core to the field. There's an emerging cope that progress in capabilities isn't slowing down—it's just invisible as benchmarks are saturated; vibe checks are useless because models are now superhuman so we can't perceive improvement. - ThreadSky

randomwalker.bsky.social • 80 days ago

AI hype is making AI researchers forget painfully learned lessons core to the field.

There's an emerging cope that progress in capabilities isn't slowing down—it's just invisible as benchmarks are saturated; vibe checks are useless because models are now superhuman so we can't perceive improvement.

Comments

drscotthawley.bsky.social•79 days ago

What is an emerging cope? ("Cope" as a noun) Like we're increasingly using AI to cope with things?

digthatdata.bsky.social•79 days ago

He's using "cope" as a pejorative, in the sense that someone could be so emotionally invested in a hypothesis that if it turned out to be objectively wrong, they would "cope" by lying to themselves about it.

segyges.bsky.social•79 days ago

he thinks progress is slowing down and people are in denial

segyges.bsky.social•79 days ago

i am slightly sympathetic to the pov and strongly resisting the urge to nitpick for a dunk

copyleaks.bsky.social•79 days ago

The idea that AI progress is invisible because models are "superhuman" feels like an oversimplification. Benchmarks may be saturated, but there are still fundamental challenges in areas like alignment, ethical implications, and generalization.

alex.barcelona•79 days ago

I would add one more axis to the discussion: cost. I entirely disagree on the alleged high value of models (LLMs or image generators, etc.). It has *some* value, but at what cost? OpenAI cannot do what YouTube did to be profitable (ads, and leverage from the content it's stored in it).

dmarti.bsky.social•78 days ago

Good point. IMHO the next AI benchmarks to watch will be based on prediction markets, which have costs built in. Can a bot make enough profits from trading to pay the rent for the computing resources it uses and the interest on its stake? https://blog.zgp.org/money-bots-talk/

randomwalker.bsky.social•80 days ago

So the narrative is that what we should really be paying attention to are extremely hard benchmarks consisting of tasks like math problems.

This is exactly wrong. In the early days of AI, researchers tackled chess because they thought real-world problems like computer vision would be too easy!

randomwalker.bsky.social•80 days ago

We've since learned that superhuman ability in closed domains doesn't necessarily transfer to actually useful tasks.

Prediction: as AI solves "extremely hard" benchmarks, AI boosters will start to claim that we have superintelligence yet no one's using it because they're too stupid to recognize it.

randomwalker.bsky.social•80 days ago

My perspective:
(1) The actually hard problems for AI are the things that don't tend to be measured by benchmarks, hence the importance of vibes.
(2) Benchmarks have always been of limited value. (We've been saying this long before they became saturated. https://www.aisnakeoil.com/p/gpt-4-and-professional-benchmarks)

randomwalker.bsky.social•80 days ago

(3) Everyday people in many ways have a better understanding of AI limitations than AI developers in their bubble.
(4) Adoption metrics are far more informative than decontextualized capability measurements.

HT @howard.fm

randomwalker.bsky.social•80 days ago

(5) It's hard to tell if capability progress is slowing down, because (just restating what I've been saying) capability is a construct that is highly sensitive to how you measure it. Still, my interpretation of the evidence is that it is indeed slowing (AI Snake Oil essay coming soon!)

randomwalker.bsky.social•80 days ago

(6) In many ways this is a good thing, because has forced developers to switch from the monomaniacal focus on scaling and capability to actually building products that let people do useful things with these models. https://www.aisnakeoil.com/p/ai-companies-are-pivoting-from-creating

Comments

Posting Rules

Reply