Reminder that LLMs are a dismal expensive prototype with a handful of potential applications, not a reliable general purpose technology. Researchers tried a couple on the 2025 USA Mathematical Olympiad, a high school competition. Best score was 2 out of 42. arxiv.org/abs/2503.219... - ThreadSky

maxkennerly.bsky.social • 3 days ago

Reminder that LLMs are a dismal expensive prototype with a handful of potential applications, not a reliable general purpose technology.

Researchers tried a couple on the 2025 USA Mathematical Olympiad, a high school competition.

Best score was 2 out of 42.
https://arxiv.org/abs/2503.21934v1

Comments

lawnerd.bsky.social•3 days ago

Most very smart people get 0 on that tbf

paulkirk.bsky.social•3 days ago

Third sentence under Section 3.2 (“Failure Modes”) is a bit of an eyebrow raiser.

docgok.bsky.social•3 days ago

Calling USaMO "a high school math competition" is misleading. 99.9% of humans would score a zero on this test. And notably models have recently started performing very competitively on AIME, the qualification test before this one. A year ago, they were getting 10%

jpz0.bsky.social•3 days ago

You are really ignorant. I’m no AI booster, I’ve worked as a software developer since 1991 and I’m still at the top of the game - LLMs are incredibly useful labour saving devices when used effectively across a broad range of applications.

jpz0.bsky.social•3 days ago

1) proof reading and tone correction

2) general research, as good as a gopher when starting with a blank page

3) very specifically in my line, allowing me to go from concept to code particularly in non specialist languages.

Nobody is saying it is replacing skilful people in total.

chris-mas.bsky.social•3 days ago

Actually, AI *is* being sold to (read forced on) us as a universal intelligence tool that will no longer require anyone to think about anything. If LLM proponents could just stay in their lane, skeptics would merely be skeptics and not gtfoh-with-your-pretend-AI skeptics.

jpz0.bsky.social•3 days ago

The bullshit on LinkedIn is next level. But I’m not arguing any of that bullshit. It’s a revolutionary technology that is gonna upset a lot of shit.

chris-mas.bsky.social•3 days ago

I get it. I was a tech working on advanced laser development projects back when it was being sold as the future of everything from star wars weapons to eliminating scalpels in surgery. LLM will find big niches as well, but it is *literally* forced on too many app users where it will have min impact.

andywaters.bsky.social•3 days ago

Too many folks in these replies arguing whether LLMs are *useful* when they should be asking whether the (hypothetical and often very strained) use cases are remotely *cost-effective*

star-ringer.bsky.social•3 days ago

Your last statement is extremely incorrect.

hephaestion.gay•3 days ago

This seems like an overly specific application to judge an entire technology on, especially if you're aiming to draw the conclusion that it's a poor "general purpose" tool. Most adults forced to do a math olympiad would get a 0 and most problems given to LLMs are not intentionally obfuscated.

hephaestion.gay•3 days ago

And what's even the point? "Hah, the machine struggles to do very specific and difficult math problems! Useless!"You'd think there's probably some more relevant critiques with regards to the infinite plagiarism machine and how that contributes to the incuriosity epidemic, no?

jaoswald.bsky.social•3 days ago

The actual truth is that LLMs are powerful bullshit machines, useful only for generating industrial quantities of bullshit.

4quad.bsky.social•3 days ago

The current batch of stable LLMs are building blocks, not something in and of themselves useful.

Things like "get the text in this image". Cool! But not an end in itself.

Comments

Posting Rules

Reply