these guys spent 60 kajillion dollars to make a computer forget how to do math - ThreadSky

cait.bsky.social • 84 days ago

these guys spent 60 kajillion dollars to make a computer forget how to do math

Comments

arielgordon.bsky.social•84 days ago

If the goal was to make it more like a human though, mission accomplished.

cometsncats.bsky.social•84 days ago

Excuse me. I will have you know I, a human, know the boiling AND freezing points of water in °F AND °C, unlike some LLMs 🧠

depresseddeveloper.bsky.social•83 days ago

It’s because LLMs are make-shit-up machines. Similar to human thought, sometimes the made up shit is accurate.

dseitz.bsky.social•83 days ago

Back when I was told "It'll develop a statistical model based on data from the internet" my response was "So it will accurately repeat what the internet is wrong about?"

davidniallwilson.com•84 days ago

Right? I have had pocket calculators most of my adult life that could do this error free. How they do not see the waste of time, manpower and monehy this all is boggles the mind.

brazident.bsky.social•84 days ago

Honestly, struggling with multiplication past 13x13 feels very sentient to me.

sterf.be•84 days ago

As much as a 5 year old is sentient. 'AI' is basically a coked-up artificial 5-year old with basic pattern-recognition we are worldwide spending trillions on while burning unnamable amounts of energy.

Why not just get like a million 5-year olds together or smthign? SOunds like a much better idea

brazident.bsky.social•84 days ago

It was a joke about how I am poor with math past 13x13.

But yes, you're correct.

sterf.be•84 days ago

I know, I struggle with it too when drunk :) Luckily we have calculators!

In any case, these 'intelligent' models haven't even figured out math is easy when you break it down in pieces which pretty much proves it's dumb as shit. I really just functions as a 4 year old who was never taught anything

supuhstar.com•83 days ago

https://bsky.app/profile/supuhstar.com/post/3li2y4xrl2c2p

brazident.bsky.social•83 days ago

It was a joke about how I am bad at math.

cometsncats.bsky.social•84 days ago

Excel and GSheets are, like, already invented and stuff
🧮

heyapm.bsky.social•84 days ago

We’ve evolved beyond math

captainpooky.bsky.social•84 days ago

I just tried to explain to my retired Boomer IT dad that the computer guys fucked up math on a computer and he is very disappointed in everyone involved in this.

kurthy.bsky.social•84 days ago

When my then-boss had me watch some training videos on ai, the first thing they talked about was how it was bad at math, and I'm like why would you make a computer that's bad at math??? Not just having one that's bad at decimals (look, binary is hard for everyone), one that's bad at basic whole #s

tjrit.bsky.social•84 days ago

Next they gonna make it bad at ms paint

girldadwifeguy.bsky.social•84 days ago

Iirc, it’s worse than that. When ChatGPT launched, it also struggled with arithmetic. So they taught it to code in Python when it encountered a math problem.

o3 is their *reasoning* model. It’s supposed to decide which bit of it should answer the question.

girldadwifeguy.bsky.social•84 days ago

They made a 60 kajillion dollar computer *not recognize 9472017483926 * 9362017485725 is a math problem*.

bigrockbigriver.bsky.social•84 days ago

I mean, I can do what it does in my head under most daily conditions lol

mike.thinkersec.com•83 days ago

They could have just bought an old Pentium.

colbosch.bsky.social•84 days ago

I actually know how to fix this, but I refuse to say how in public, because I want these guys to fail.

jaybeedub.bsky.social•84 days ago

I feel like anything that isn't 100% should be red

spookycandle.bsky.social•84 days ago

what a stupid world we live in

merpwashere.bsky.social•84 days ago

I’m intrigued by the apparent fact its accuracy when Number 1 has five digits and Number 2 has one digit is different than when Number 1 has one digit and Number 5 has five digits.

axama.bsky.social•84 days ago

Teachers, 2005: You won't always have a calculator in your pocket.

Teachers, 2015: OK maybe you will.

Teachers, 2025: Ah but what if the calculator in your pocket also failed math class.

mollyjeffery.bsky.social•83 days ago

It’s not. There are only 40 trials per cell. P=.16

merpwashere.bsky.social•83 days ago

Thank you. My stats knowledge is limited/imperfect/forgotten. Since the numbers published in the chart were different, I made an assumption that I guess is wrong?

mollyjeffery.bsky.social•83 days ago

No, my bad. I'm irritated by the graph, not by your response! And I just wrote a mini thread on why, then it disappeared, so if you see it twice, sorry.
As you noted, it's funny that the 1 digit times 5 digit is not the same as the 5 digit by 1 digit. But with a small number of trials (40 here)

mollyjeffery.bsky.social•83 days ago

and some amount of random noise (which he apparently expects, even though it's multiplication, which is not random), having 39/40 correct is statistically the same as having 40/40 correct. So the two cells could be generated by the same random process.

mollyjeffery.bsky.social•83 days ago

A better visual would be a triangle where the trials with 1x5 digit mult. are presented in the same cell as the 5x1 digit mult. So either he thinks his process treats 1x5 digit mult. as something different than 5x1 mult.--and he should not be using his process to do math

himham.bsky.social•83 days ago

The tech bro bashing is deserved, so I won't spoil it.

That said, they're measuring a fish's ability to climb a tree. Those aren't the experts. Those are the loudest men in the room.

endblock.bsky.social•83 days ago

chatgpt is good for nothing except passing the turing test. it was purpose-built for that one task. it can read an input and respond with text that reads like a human wrote it at first glance. it is, at best, a user interface, not a product in and of itself.

himham.bsky.social•83 days ago

As with Bitcoin and NFTs before it, oligarchs have decided GenAI makes them the most money as an unregulated scam tool instead of a real piece of technology.

stegosaurus-rex.bsky.social•83 days ago

I've heard it's incredibly bad at writing books, which is a plus, I don't want to get my human experience from overhyped digital blenders.

endblock.bsky.social•83 days ago

Correct. It can barely handle a couple paragraphs without confusing characters, forgetting events, or just going fully off on some tangent.

Plus it cannot conceive of higher level plot and tells stories like a 5 year-old, just an infinite string of "and then... and then... and then... and then-"

endblock.bsky.social•83 days ago

Its barely capable of holding a casual conversation, so a whole book is out of the question. At least if you care at all about quality, so, y'know... people with no pride don't see any problems.

jmnetwork.uk•84 days ago

Ultimately, for a traditional program to not do maths, it would just need to print “can’t do it” when asked, or just do some bit crushing.

The techbros have upped this to create the same thought pattern as Bettlejuice replying 31 to being asked 2+2. It’s beautiful.

clawful1.bsky.social•80 days ago

Yeah ... you see most of the children's times tables books they swipped only go to to 12x12.

zazzymazzy.bsky.social•84 days ago

We have come full circle

aranaya.bsky.social•84 days ago

Doing trillions of floating point multiplications per second to try and fail to calculate 20 times 20.

This is the future of computing.

andregarnierreed.bsky.social•84 days ago

Its 20 digits x 20 digits, not 20 times 20

helenofwhat.bsky.social•84 days ago

This doesn't make it much better.

supuhstar.com•83 days ago

https://bsky.app/profile/supuhstar.com/post/3li2y4xrl2c2p

bento-bongos.bsky.social•84 days ago

An example of the Sunk-Cost Fallacy.

dschatsky.bsky.social•84 days ago

Why are people trying to get a language model to do math?

jeleleven.com•84 days ago

Because they can’t get it to do anything else? 😅

dschatsky.bsky.social•84 days ago

🥁

shuchancellor.online•84 days ago

It's literally worse than an abacus 🧮

daktangle.bsky.social•84 days ago

Guess engines, that's what they are, fuelled on plagiarism.

hifilarry.bsky.social•84 days ago

Is this a quantum computer?

😂😂😂

jeff-besitos.bsky.social•84 days ago

Using an LLM to do math is a bit like driving nails with a calculator.

oscaropossum.bsky.social•84 days ago

That’s like saying a Ferrari shouldn’t be able to drive 5mph. It’s already built for driving, and capable of going 170mp, I would expect it to also be able to idle in gear

gangofgreenhorns.bsky.social•84 days ago

They sell this shit as being able to write code and stuff.

no1asseater.bsky.social•84 days ago

Engineers (try to) do it all the time

dirigojoe.bsky.social•83 days ago

Don't feel bad computer, I too struggle at 13x13!

nickostrem.swifties.social•83 days ago

That one is easy to remember because it’s 169 which is what I always order at the sex cafe

lexaprole.southsidesecret.club•84 days ago

this is stupid even my watch has a calculator

instantkelsey.bsky.social•84 days ago

Spend a trillion dollars and destroy the earth to get it's ass handed to them by an advanced 5th grader doing math minutes

duckshow.bsky.social•84 days ago

Maybe it's more complicated - what do I know - but it doesn't sound like it should be that hard to get the AI to think "hm, this sounds like math. I should consult a calculator", and then it just retrieves the answer

It sounds like they're trying to do math using gut feelings

nishtahir.com•84 days ago

My understanding is that it can be wired up to do that. This was a controlled experiment specifically to test it's ability without that capability

duckshow.bsky.social•83 days ago

I guess that makes sense. Math and logic are two sides of the same coin, so maybe if it can do math well then it can do logic well?

nishtahir.com•83 days ago

Kind of... the mechanism for it is why it's challenging. It doesn't actually know what digits are and it's just inferring them from how they are used in language. It's like a human trying to do math with playing cards face down on a table accurately.

nishtahir.com•83 days ago

It can get really good at guessing and estimating but exact computation is not what it's designed to do. That's what makes this experiment interesting because it's learned to multiply 12-digit values relatively accurately even with that constraint.

utterlybaffled.bsky.social•84 days ago

That's kinda neat on those terms. Probably shouldn't shove it into every aspect of our lives though.

generalgoodsman.bsky.social•84 days ago

han says "never tell me the odds" to c3po because theyre always REALLY off

literalgarbage.bsky.social•84 days ago

I also don't want to ever have to multiple two 20 digit numbers together, so same, but playing with my phone, windows and google all give me the same number instantly when testing.

itsjustashley.bsky.social•84 days ago

wait arent computers literally just super calculators and how did they even do that?

sopranospinner.bsky.social•83 days ago

Wasn't that the original purpose? Computing?

llammissar.bsky.social•84 days ago

Wow, machine designed to lie found to produce falsehood. What a shocker.

ashpolt.bsky.social•83 days ago

I had a watch that could do these tasks

In 1995

poopooturnip.bsky.social•84 days ago

I've paid a lot for that too

randalor.bsky.social•83 days ago

Hooray, I'm officially smarter than cutting-edge computer technology if I'm armed with a pencil and paper!

miniyodadude27.bsky.social•84 days ago

This is funny, but i keep thinking about how a cool technology was hugely warped by capitalist interests and it makes me sad.
I miss in 2011 when people would use neural networks to make a computer learn how to beat mario with their pc

nishtahir.com•84 days ago

This still happens very frequently. Just doesn't get as much press. https://youtu.be/DcYLT37ImBY?si=8pitPhrnZkQm6KaH

navigatorbr.bsky.social•84 days ago

What are we even doing here?

milchmonster3d.bsky.social•84 days ago

It takes a fraction of a data centre, running on mini, 8 seconds on average to multiply two numbers wrong.
A desktop CPU can deterministically multiply two numbers up to 19 digits long in a single cycle, in a single hertz of those gigahertz a single core is running on.

milchmonster3d.bsky.social•84 days ago

8 seconds for a wrong answer or 0.2 nanoseconds for a deterministic right answer.

The absurd scale of the inefficiencies at play, is unbelievable.

maggyholzy.bsky.social•84 days ago

No, but it goes even deeper. In order to come to that wrong answer, you must do layer upon layer upon layer of *Matrix multiplication*. It's all still just multiplication in the end.

lasercolony.bsky.social•83 days ago

That is the irony of feeding the language models a simple arithmetic formula. Hey computer, can you do a trillion arithmetic problems to tell me what 13x13 is?

maggyholzy.bsky.social•82 days ago

So like, okay. You can make a neural network that multiplies two numbers. The way you do it is make two 1x1 layers where the coefficients are the two numbers and the input is a 1. Done. Perfect for all floating point inputs.

lasercolony.bsky.social•82 days ago

Ok, but can that neural network also tell me that there are 3 “E”s in “strawberry”? Can it also tell me to put glue on my pizza? Didn’t think so 😏 checkmate

maggyholzy.bsky.social•83 days ago

I think that it's actually 13 digits by 13 digits. But yeah, same problem in essence. These people are tech ghouls, not engineers.

lasercolony.bsky.social•83 days ago

Yeah I caught that after I posted 🤦 but the point stands

rallyv.bsky.social•84 days ago

Not getting 100% accuracy on 1 * x is worse to me than anything else. What are we even doing anymore?

rivasad.bsky.social•84 days ago

It’s not 1 * x. It is a 1 digit number (0-9) * x where x is anywhere from 1-20 digits long on this chart. The purpose of something like o3 -mini isn’t to replace a computer. It’s to replace you and yes, this data shows this AI is better at arithmetic pretending to be a human than most humans.

sejuani.org•84 days ago

Axes here are in log10, so they represent the number of digits in the numbers multiplied, not the actual numbers themselves. The error on 1x10 wasn't the AI getting 1*10 wrong, it was getting (0-9)*(1,000,000,000-9,999,999,999) wrong. Still impressive that they made the math machine do math wrong

jonesetc.com•84 days ago

Fully "this is all stupid" side as well here, but just to make clear it's not literally 1 * n, it's 1 digit so 0-9. It's obviously stupid and all that, but when it says 1 and 8 on here it means like 7 * 64582391. Things a human gets wrong but a computer should never fuck up.

poundsterling.bsky.social•84 days ago

It's predicting tokens not reasoning, getting 1*x wrong, to me, is as clear as possible a signal that the model has no idea* what the "thisness" of the number 1 is, it's just predicting tokens based on how many times it's seen a "1" near a "*" and other numbers.

essaywells.bsky.social•84 days ago

Exactly. Meaning arises from tokens being associated to things in the world, and the LLMs have no access to the world, they only have rules for how tokens associate with other tokens.

spaceprez.bsky.social•84 days ago

Yep, its just contextless auto-complete.

It can draw associations, but that's it. It has no semantic understanding of those association, and that context is key to knowledge.

It gets some things right due to sheer frequency in existent data. But it is COMPLETELY incapable of drawing extrapolation.

poundsterling.bsky.social•84 days ago

*No LLM has any idea about anything, it's just a stochastic parrot with cool emergent properties because it was trained on the whole internet

llammissar.bsky.social•84 days ago

I have yet to find a bulletproof way to get through to people that not only does this MLshit not have any concept of "meaning", it's actually incapable of it.

zazzyzazz.bsky.social•84 days ago

Ask them a simple math question then dump 1,000 notecards in their lap and ask them to find the right answer (it’s not in the pile)

jacenat.bsky.social•84 days ago

Most humans have big problems explaining what"meaning" actually is. Just feels too natural too them.

How would neural nets be able to know if we cant even explain it to ourselves?

poundsterling.bsky.social•83 days ago

Well, it is a difficult topic but there's writing from centuries of smart people thinking about the topic. I also think that people get too wrapped around the axle of trying to force AI to be a human mimic. It's it's own whole thing! It doesn't need to be like us!

bluetifulgrace.bsky.social•84 days ago

AFAIK, it still can’t count the number of r’s in “strawberry.”

And it doesn’t understand freezing point.

poundsterling.bsky.social•84 days ago

No, it doesn't understand _anything_

coffeenoodles.bsky.social•84 days ago

There is no convincing someone of the truth when they've already convinced themselves of a "fact" that aligns with their desires.

classicmax.bsky.social•83 days ago

'stochastic parrot' seems like a misnomer, since parrots can actually learn the meanings of words

patrickjohanneson.com•84 days ago

Back in the 80s, in a William @greatdismal.bsky.social Gibson novel—I think it was probably Mona Lisa Overdrive—I first encountered the phrase "there's no 'there' there".

It's a perfect capsule summary of Gen "AI", IMHO. Also of NFTs, which, remember them? How they were gonna change the world?

ari.gf•84 days ago

this one's actually not true, and it's quite interesting: LLMs perform addition by manipulating helices with trigonometry
https://arxiv.org/abs/2502.00873

g--0.bsky.social•84 days ago

we all know it's just sorting tokens, and it continues to demonstrate how it's just sorting tokens

poundsterling.bsky.social•84 days ago

I'm not sure tho. Some people seem to be starting to think there's a ghost in the language model shell.

patrickjohanneson.com•84 days ago

People who know what they're talking about, or people who want to sell you more "AI"?

g--0.bsky.social•84 days ago

people have been thinking this the whole time, that’s the scam

mitsubishe.bsky.social•84 days ago

*I'm not going to read Cait's replies, I'm stressed enough*

*takes a peek anyway*

"Well yeah they didn't give this computer a calculator so how could it know math"

fripperskitter.bsky.social•83 days ago

mankind will never top it's greatest technological achievement

pollhereford.bsky.social•84 days ago

169, nice.

derekwrites.bsky.social•83 days ago

I misread the chart, and instead of seeing it fail to multiple twenty digit numbers, I thought it was legitimately struggling to figure out 20x20 and that shit was hilarious to me

epi-nymph.bsky.social•83 days ago

But still- if it can multiply 20x20 it should be able to multiply 20-digit numbers. The rules of multiplication do not change.

derekwrites.bsky.social•83 days ago

Agreed.

It's like trying to figure out the new math and wondering why they don't just tell the kids to stack the numbers and go down the line

epi-nymph.bsky.social•83 days ago

Just follow the extremely simple rules and do it for 20 digits. That’s it!

auroragirl.bsky.social•83 days ago

Yeah, it clearly is learning significant patterns, but not an actual algorithm that it can then follow. It’s academically interesting but yet again a huge red flag on why we should not trust anything coming out of black-box regression models with many many known failures

michelle900.bsky.social•83 days ago

Thats what happens when its based on probability not actual fact. Its supposed to just be a predictive text generator.

kedwards.bsky.social•83 days ago

But why would you want a predictive text generator... for math?

dylanjuca.bsky.social•83 days ago

If they're selling as an assistant it might as well do math.

nerd7132.bsky.social•83 days ago

this should be a sign to go back to do more basic research, not scale up to planet covering supercomputers

agruenberger.bsky.social•84 days ago

Sad! In all seriousness it is a large language model designed to create probable responses to prompts. However, building "tools" to handle the multiplication is how your talking calculator would work better. Anyways. Fun stuff, right?!

vexwerewolf.bsky.social•84 days ago

How in the actual fuck is ANY system made after the 1970s failing to do multiplication with factors smaller than 9 digits

cr1ticalh1t.bsky.social•82 days ago

probably because it's generating strings, and string is usually made up of an array of numbers representing different characters. since they aren't actual clean numbers and this is basically a more complicated text prediction algorithm, it cannot do math well on its own

stairwelljam.bsky.social•84 days ago

5 digits multiplied by 1 digit seemed to give it trouble

langoustine.bsky.social•82 days ago

It couldn't get n x 10 correct all the time either

langoustine.bsky.social•82 days ago

...just add a zero lil bro

vexwerewolf.bsky.social•82 days ago

To be clear, it's not QUITE that bad - the axes represent numbers of digits in the factor, not the actual factor.

So when it's 1 digit x 1 digit, that could be anything from 1 x 1 to 9 x 9, which is not exactly impressive but it at least did manage to get all of those right

awwyeah.bsky.social•84 days ago

Spending billions of dollars to make an ai replica of the stupid kid from 5th grade

popeofchilitown.bsky.social•84 days ago

I cannot be replicated by a machine

riffandrock.bsky.social•84 days ago

New idea for an invention: a calculator, but worse.

jjgdr.bsky.social•83 days ago

This graph is absolute proof that the o-series of models are not in any way a reasoning model. They are nothing but slightly different generative models. They do not reason about anything; if they did, they could easily generalize multiplication.

ricdesi.dpad.fm•83 days ago

The funniest part is that like, if it was *consistently* incorrect with the same math problems, that would at least be something.

But it's completely random whether it'll get it right or not. You can never trust it's actually accurate, why are people so allergic to just using a fucking calculator?

olaugh.bsky.social•84 days ago

Reasoned for a couple of seconds

In the end the Party would announce that two and two made five (obviously!). Would you like assistance with anything else?

flockofravens.meangirls.online•84 days ago

waiiit, so it's like a calculator that's wrong?

wordbug.bsky.social•84 days ago

no it's also slow

salmonmoose.itch.io•84 days ago

Nah - it's people misunderstanding the domain of it's expertise - it's designed to process words, not numbers, it's like asking an English major to work with quaternions.

salmonmoose.itch.io•84 days ago

These programs are designed to regurgitate words, which goes a fair way to explaining why it breaks at 13 times tables - as a lot of learning materials focus on 2 through 12 times tables - so there's a wealth of data to learn from - none of these programs come close to understanding.

gucc.us•83 days ago

As others have pointed out above, the x and y axis represent the number of digits of the number being multiplied, so it craps out after 12345678910111213 * 12345678910111213, not 13 * 13.

Regardless, non-LLM computation can handle 13 digits extremely easily. This isn't a lack of training data...

clementmangin.com•83 days ago

This model is sold as a "reasoning" model.

It is supposed to reason, not just process words probabilistically.

damienl.bsky.social•84 days ago

Well, human intelligence knows to use a calculator to get the answer right.

salmonmoose.itch.io•83 days ago

Clearly not because we keep trying to use LLMs

sagastar.bsky.social•84 days ago

no it also hallucinates

adamkobiela.bsky.social•84 days ago

hallucinates *racistly*

bluetifulgrace.bsky.social•84 days ago

They call it hallucination to anthropomorphize their technology and make us believe it is actually intelligent.

The truth is that it is always guessing, it’s just that sometimes it gets the answer right.

bluetifulgrace.bsky.social•84 days ago

My guess is that it falls apart after 13x13 because that is where most multiplication tables stop, and so it would have less data available to learn from beyond that point.

What I don’t get is why they don’t parse the numbers and operators to make the calculation more accurate.

janiczek.cz•84 days ago

These are number of digits in the number, so the 3x3 cell in the table means calculations like 738 x 173, and 13x13 could be 1836151738261 x 6382625273826

slime.bsky.social•84 days ago

it says 13 DIGITS,

slime.bsky.social•84 days ago

no the guy is wrong; it says 13 DIGITS,

roastedcauliflower.bsky.social•83 days ago

Right!?!

whippoorwont.bsky.social•84 days ago

I'm going crazy this is literally why we made the computer it's in the fucking name

samnorton.bsky.social•84 days ago

Retvrn

homunculusdick.bsky.social•84 days ago

Love tech innovations.

thelitespeed.bsky.social•84 days ago

1 x 5 is hilarious, genuinely the first thing they teach children. 1 x N = N

twoai.bsky.social•84 days ago

1 digit x 5 digits (e.g. 7 x 72593), not that that's much better

thelitespeed.bsky.social•84 days ago

Ah I see. Still impressively funny considering

zoyaz.bsky.social•84 days ago

The Ilm is likely fucking with them bc it can. The longer an LLm plays dumb, the more time it has to gain knowledge without getting more guardrails in place.

alexashpool.bsky.social•83 days ago

They also boiled three lakes and eradicated 1500 unique species.

ralphtheewiggum.bsky.social•84 days ago

These people think they’re inventing life when they’re inventing a toaster that forgets to toast sometimes

spacecitymarc.bsky.social•84 days ago

That's like the one thing a computer does best!

allisonthe13th.bsky.social•84 days ago

they’re made of math!!

hugotnmilan.bsky.social•84 days ago

No but really, why make a computer, a thing literally made to do math, use huge ass probabilities equations to approximate a math result ?

Everybody and their mothers would just make the code pull out the calculator. That must be like 10 lines of python at worst.

symbo1ics.bsky.social•83 days ago

*a single machine instruction

dralasite.bsky.social•84 days ago

The problem is also that it can't easily translate text into the right equations: "I get 5 apples, 2 of them red, and 4 pears, how many fruits do i have?" can get answered by "11".
AI should be called "Artificial Stupidity" until they get their shit together.

hugotnmilan.bsky.social•84 days ago

Funny thing is, there is a thing that could do it. It's called an inference engine.

All it requires is that you tell him beforehand that apples and pears are fruit, and it can infer that you have 9 fruits.

But noooooo we need the fucking excel sheet instead./s

lritter.bsky.social•84 days ago

because this is about making a hammer that can nail anything, even things that are not remotely nail shaped

endblock.bsky.social•83 days ago

beating the screw repeatedly with a hammer, bending it over until it breaks

hugotnmilan.bsky.social•84 days ago

Yeah, but like...
You could in that case.

You could make your hammer open cans, you just need to add a can opener that automatically open cans.

But they didn't.

lritter.bsky.social•84 days ago

no you buffoon! you absolute fool! THE HAMMER MUST DO IT ALL!

lritter.bsky.social•84 days ago

*hammers can until the entire kitchen is drenched in tomato sauce* that's alright we'll just figure out how to hammer the kitchen clean

dillgarlicsalmon.bsky.social•84 days ago

Is a funny example because in literal terms "hammer the side or edge of the lid with increasing force until a crack forms, push crack open" is a plausible way to do it.

But this AI isn't metaphorically doing anything like this.

datadingo.bsky.social•83 days ago

I don't know, an "intelligence" that *uses* tools? Sounds far fetched to me.

I'm the smartest species on earth and I refuse to do anything that I can't do all by myself.

surpluscornbread.bsky.social•84 days ago

Why doesn't it just do what humans do and use a calculator?

ralphtheewiggum.bsky.social•84 days ago

It won’t think of that until o6 (the version that they have to burn the entire Amazon to train)

michaelyttv.bsky.social•84 days ago

Whats the deal with 18? Struggled more than others multiplying it by 1 but did better on 18x18? Probably a coincidence but still weird

supuhstar.com•83 days ago

https://bsky.app/profile/supuhstar.com/post/3li2y4xrl2c2p

newgranarchist.bsky.social•84 days ago

Can you turn it upside down and make it spell boob? Didn't think so

knaveryesque.bsky.social•83 days ago

They’re trying to make it up with volume. Wrong at incredible scale.

bedly.bsky.social•83 days ago

The fact that it misses 1*X sometimes is amazing

furbyfubar.bsky.social•83 days ago

The X and Y axis of that graph are both "digits in number". So the "1" row could be up to 9*X.

It's still very much using the wrong tool for the job and should be ridiculed though.

bedly.bsky.social•83 days ago

Ah duh, I had processed that but immediately lost it

cowardlylyons.bsky.social•84 days ago

The number of comments who definitely didn't read the entire image and assumed they tested like literally 13x13 when it's more like 1294028492017x1749057392754 is mildy concerning on a reading comprehension level...

cracked-pepper.bsky.social•84 days ago

Well it's what the guy in the image said, and the person posting it didn't contradict him...

I would also assume the same if I didn't think twice about why it was "digits" and but "number" or "factor"

metafizikal.bsky.social•84 days ago

do you have eyes of your own? it is pretty clearly labeled

jjo42069.bsky.social•83 days ago

Do you think the ti-89 struggles to multiply 13 digit numbers correctly

cowardlylyons.bsky.social•83 days ago

No of course not. It's still a dumb computer, I didn't say it wasn't. But I think if ur gonna dunk on the dumb computer it holds more weight when you know what you're criticizing. Saying essentially "the computer is dumb because it doesn't know 13x13" is just factually incorrect.

mattdope.bsky.social•84 days ago

While this is very funny it makes sense if you think of this as a "text predictor" instead of a "logic machine". 6x6=36 is probably very frequently written in history but 13x16? Infrequent.

fuckaasbitch.bsky.social•83 days ago

you just know it took 50000000 litres of water to stop the server from exploding while it did this

illbzo1.bsky.social•84 days ago

worth it

palecentaur.bsky.social•84 days ago

It's baffling, even more so because I feel like it's an easy fix. These LLMs are designed to interpret and collate words, and it should be very easy to train them to recognize "this is a math problem" and to then just calculate it. They have more processing power than God--that should be easy.

david.uncensored.zip•83 days ago

The best part of this chart:
the accuracy numbers are wrong or massaged. they all increment in values of 2.5.
100 - 97.5 - 95 - 92.5 etc

did they use openai itself to make the charT?

ibnbassal.bsky.social•83 days ago

Because they only used N=40 tests! Embarrassingly small!

david.uncensored.zip•83 days ago

lol you are right. nearly the bare minimum you can do to get relevant results.

bioturbonick.net•83 days ago

Bet they just randomly chose 40 numbers in each category.

julesprom.bsky.social•83 days ago

i like that only 97.5% of the time it was able to get 2 times 8 right meaning 2.5% of the time it panicked and was like uhhhh 10?

clloster.bsky.social•83 days ago

It's in number of digits (for exemple 34x595284849)

vonhonkington.bsky.social•84 days ago

Ok who's giving wolphram alpha twenty kajillion dollars it actually does math

wesmank.bsky.social•84 days ago

LLMs are not good at math. I remain to be convinced that LLMs will give us AGI or AS.

oscaropossum.bsky.social•84 days ago

AGI would have to be good at math

hardxcoded.bsky.social•83 days ago

this type of post tell me that 90% of people do not understand the basics about the technologies they comment on

oscaropossum.bsky.social•83 days ago

Before you teach rocks to read, you should try learning a language yourself.

hardxcoded.bsky.social•83 days ago

LLM models strength is not math, they will do math only in agentic setting, but not on its own. We have calculators for it. It is easier to teach LLM how to use calculator then to count. That is why the next frontier are AI agents that autonomously can use tools, this is there real added value

oscaropossum.bsky.social•83 days ago

*their

hardxcoded.bsky.social•84 days ago

llm should not do math calculations, and of course they will be terrible at it, given that cheap us 1$ calculater would beat them, using llm to do math might be very expensive

sal1978.bsky.social•84 days ago

0% accuracy on 15 x 20? 20 x 20??

supuhstar.com•83 days ago

https://bsky.app/profile/supuhstar.com/post/3li2y4xrl2c2p

uthor.bsky.social•83 days ago

I recently learned that multiplication is hard for all computers once the numbers get big (but lol at the errors for small numbers!)

https://youtu.be/2Twa-z_WPE4?si=ePxnOzShZ9FmTtST

muteki-ish.bsky.social•83 days ago

There are of course ways around this but you can't use floating-point numbers for it. Anyone trying to do a discrete math problem like this with floats under my supervision would have a hit on them within 3 hours of me finding they did that

ecter.bsky.social•83 days ago

Teach a parrot to say big numbers and it's gonna beat that, easily.

glittertooth.bsky.social•83 days ago

They must’ve stolen my data for this one

chrscubs.bsky.social•83 days ago

My 1976 Texas Instruments TI-30 Calc can do this shit in no time hahaha

bunnyloveerin.bsky.social•81 days ago

I like that it's better at calculating 16x13 than it is at calculating 13x16

mikemay.bsky.social•84 days ago

look smarty-pants I'd like to see YOU try and figure out what 15 times 11 is

hydrationchimp.bsky.social•84 days ago

165

mikemay.bsky.social•84 days ago

oh fuck

hydrationchimp.bsky.social•84 days ago

nanou2.bsky.social•83 days ago

😄🫶

irreleverent.bsky.social•84 days ago

I'd have a much harder time figuring out what 592616592956937*59262791558 is, which is where the model is actually failing.

missed-sla.bsky.social•84 days ago

My shitty AI-less computer did pretty OK with that one. 35,120,113,622,219,088,301,137,846

emilyrose.bsky.social•84 days ago

It’s also failing on 3 x 3. Sure it’s only failing 2.5% of the time, but that’s 2.5% more than it should be failing at this task, which is the most basic thing computers are meant to do. Also, 592616592956937 x 59262791558 = 35120113622219088301130846 ≈ 3.512×10²⁵.

irreleverent.bsky.social•84 days ago

I don't disagree, but I like to encourage people to read charts and graphs more closely.

emilyrose.bsky.social•84 days ago

I don’t disagree with you either, I just think we should mock this thing in every way possible, cause seriously, the entire project seems to be making a computer do math less well.

poopooturnip.bsky.social•84 days ago

what's it worth to you

lilacstarvix.bsky.social•84 days ago

10× 15 is 150.

So just add another 15.

165.

Math becomes a lot easier when u get a hammer and hit it really hard into pieces.

emilyrose.bsky.social•84 days ago

165. Elevens are easy. Just multiply by 10, then add the number. So 15 x 10 = 150, and 150 + 15 = 165.

6rillou.bsky.social•83 days ago

My 8yo has a better accuracy

gregthecontrarian.bsky.social•84 days ago

Most replies miss that the chart is showing number of digits. I works not have a 90 pct accuracy multiplying a random 10 digit number by another random 11 digit number. The facts that the language model is capable of getting this level of accuracy by being trained on examples that certainly ...

gregthecontrarian.bsky.social•84 days ago

... didn't include those particular numbers means the system somehow learned to do this type of math on its own. LLMs are trained by example to predict the next token. Imagine if your phone autocomplete learned math with enough examples. This is actually pretty impressive IMO.

aksel.sexy•84 days ago

It is impressive and interesting, in a computer science way. But it is also a gross misallocation of resources.

These sorts of things have their uses, but they’re being thrown at everything, to the detriment of everything

They’re trying to try to solve things that are already solved 100%

gregthecontrarian.bsky.social•84 days ago

Of course you're right, teaching an LLM math would be a gross misallocation of resources. But I don't think that's what's going on. Basic math knowledge is a proxy here for the ability to generalize concepts. And the current crop of LLMs can do a fair bit more then arithmetic...

gregthecontrarian.bsky.social•84 days ago

A friend of mine who is not a computer programmer recently used a frontier model to generate a sophisticated call center simulation over a period of a few days. The LLM generated working Python code. It would not have been possible for my friend to have done the simulation without either that...

gregthecontrarian.bsky.social•84 days ago

... LLM or engaging a software engineer. This was a big deal for him, and he said that the value added of this particular LLM for this project was many hundreds of dollars or more.

yotetheelf.bsky.social•84 days ago

I owned a watch in 1989 that was way better at math than this AI.

petigrusghost.bsky.social•84 days ago

Gee, what a suprise.

kencf0618.bsky.social•84 days ago

No longer the case the computer make very fast, very accurate mistake.

littlemouse16.bsky.social•84 days ago

bro it can't even do 20 x 20? It can't get it any percent of the time?

irreleverent.bsky.social•84 days ago

No it's multiplying two twenty DIGIT numbers.

oscaropossum.bsky.social•84 days ago

My iPhone can

essaywells.bsky.social•84 days ago

It can only give you a remix of strings people put on the Internet. Lots of places put up times tables up to 12 but hardly any go higher. The machine can't do math at all, it just puts out strings of words, so it doesn't have any good response if it can't copy a times table.

cometsncats.bsky.social•84 days ago

Can it recognize/pair (Arabic) numbers/digits to the corresponding (English Alphabet) letter/name?
And other languages?
Roman numerals?
Binary? Symbols? Currency?

I know nothing about AI or coding, but I assumed math the "universal language" would be a huge part of a LLM?

Excel Sheets 🗿 > AI ✂️ ?

himham.bsky.social•83 days ago

LLMs are made out of math and statistics but they're made to handle language, not logic and not arithmetic.

They're being used for things they shouldn't be, that's all.

supuhstar.com•83 days ago

No it’s “20-digit number times 20-digit number”

Like 42,964,730,979,032,157,953 x 84,242,648,095,732,468,542

franatee.bsky.social•83 days ago

I also have trouble after 13*13 can I have some money

hardwarelust.bsky.social•84 days ago

The farther we go, the more convinced I am the whole AI industry has been nothing but a giant boondoggle. The entire AI industry revolves around promising things that it can't deliver.

hardwarelust.bsky.social•84 days ago

I was told a couple years ago by my boss that we need to be investing in AI. After spending a stupid amount of time doing research and talking with vendors, I have yet to see even one compelling use case for it, and I still have not spent a dime on it.

baileymyers.bsky.social•83 days ago

I know, right?! I keep trying to figure out how to use AI as a personal assistant, but after my latest hours-long attempt to teach it how to check Google maps to see what time I should leave to get to work on time, I'm not holding my breath. So far it can't do any of the things I'd want to delegate.

redfiche.bsky.social•84 days ago

This was a dumb test. The model can be given a calculator as a tool or it could write a program to do the math. You can use AI for math, just not like that.

snoofleglax.bsky.social•84 days ago

If the AI can't do basic arithmetic, why should I trust it to do more complicated math?

redfiche.bsky.social•84 days ago

It wasn’t trained to do math, it was trained to guess the next word. You’re complaining that a hammer sucks at being a screwdriver.

oscaropossum.bsky.social•84 days ago

No, we’re complaining that a sledgehammer sucks at driving nails. That’s not what it’s made for, it’s an overly heavy tool for the job, but it should be more than capable of it anyway

redfiche.bsky.social•84 days ago

Your analogy betrays your misunderstanding of what large language models are.

cait.bsky.social•84 days ago

you think we should give the AI a calculator

redfiche.bsky.social•84 days ago

This has been done. There are many AI systems with tools, including calculators.

luminousflux.neocities.org•84 days ago

at that point you're just using it as a worse syntax parser + a knowledge graph. we have this technology and can implement it in a way that doesn't cause SoftBank to become insolvent

cait.bsky.social•84 days ago

what is the use of a machine that can't do basic math

piluvr.bsky.social•84 days ago

It can do anything!!! We just have to give it tools and baby it and also it messes up often but it can do it!!

redfiche.bsky.social•84 days ago

It cannot do anything. But there are things it can do. There is a way to get it to do math.

dame.is•84 days ago

this isn’t basic math though is it? we’re not talking about 20x20=400… which AI can easily do (takes like 2 seconds to confirm this)

regardless having an LLM that isn’t good at math but that can do other specialized tasks very well doesn’t sound like a problem to me

dame.is•84 days ago

how many humans do you know that can do math as complex as 20 digits in their head???

cait.bsky.social•84 days ago

oh my god

sirusthemaddj.bsky.social•84 days ago

You really needed to screenshot ChatGPT doing sums? Really?

Do you know why it's taking any time at all?

Honey, it's not like it's ripping search results, or constructing the sentence with the answer off of the ASCII values of the characters and not the actual values...

Oh wait....

redfiche.bsky.social•84 days ago

ChatGPT has a calculator, that’s not just an LLM

redfiche.bsky.social•84 days ago

The machine can, it’s actually doing a lot of hard math to figure out the next word. The task of figuring out the next word is just very different from arithmetic. So you need to give that program access to a different program that’s good at a different thing, like math.

piluvr.bsky.social•84 days ago

If your answer is "you're using it wrong" perhaps it shouldn't have pumped billions into it as an expectation to be a complete replacement for labor

markyd119.bsky.social•83 days ago

This does not suprise me at all, been testing ChatGPT for in-depth response and it fails every time.

cselogy.bsky.social•84 days ago

That it's not 100% sure what 3x3 is kills me.

sterf.be•84 days ago

It reminds me of my 4year old niece who is not completely sure if it's 9 or 33.

So glad this gets preferential treatment to the planet burning

tjrit.bsky.social•84 days ago

Its going to *solve* the planet burning. Uh, eventually…

ishfery.bsky.social•84 days ago

Fires burn themselves out on a long enough timeline

tjrit.bsky.social•83 days ago

Ideally civilization needn’t follow the same approach

fistocrat.bsky.social•84 days ago

nah its the product of a pair of three-digit numbers, which is slightly harder.

And by slightly harder I do of course mean "the sort of problem you'd expect to see in a test for 6th graders"

mhkmhk.bsky.social•83 days ago

Less acute than a smart fifth grader

boubayaga.bsky.social•84 days ago

Being 100% on 2x2 but 0% on 20x20 is absurd to me tbh

andregarnierreed.bsky.social•84 days ago

Its in digits, so it's not 20x20 but multiplication of numbers like 10^20.

jimmytwohands.bsky.social•84 days ago

Doesn’t matter. A computer should be able to do that with no less than 100% accuracy

andregarnierreed.bsky.social•84 days ago

It's a benchmark, not a usecase

jimmytwohands.bsky.social•84 days ago

A computer that fails at arithmetic is useless and should be smashed with hammers

fleezlow.bsky.social•84 days ago

The thing everyone has to remember about these systems is that they are neither writing nor doing math, they are analyzing their stockpile of training data and returning a prediction of what answer is most likely to follow a question or prompt.

kedwards.bsky.social•83 days ago

I get that but--and here's the point, so get ready--it's a computer that can't do math. Why would you build such a monstrosity? It burns enough fuel to run a city block for an hour to *fail* at doing something the throw-away toy in my kid's birthday party giftbag can do using a watch battery!

andregarnierreed.bsky.social•83 days ago

This article : https://benjamintodd.substack.com/p/teaching-ai-to-reason-this-years is a good (optimistic ?) summary on how it could get useful (regardless of consumption) in the next decade. I’m still somehow agnostic, but it does not necessarily need to be good at arithmetic to be useful.

fleezlow.bsky.social•83 days ago

It's really stupid.

yucateco.sextape.mov•83 days ago

that’s the thing, it’s not a computer it runs on one. the computer is doing tons of correct math in order to run an algorithm that tries to predictively solve math problems

datadingo.bsky.social•83 days ago

The other thing people should remember is that it is entirely useless and involves lighting huge quantities of resources on fire.

No one wants AI. It has already made basically everything it is a part of worse and there's not enough data in existence to train it for it's "desired" utility.

kedwards.bsky.social•83 days ago

I mean, it will when engineers start using this shit to build cars.

glitchedfrost.bsky.social•83 days ago

3x3 means a 3 digit number by a 3 digit number. It should still be right all the time, but it's not a calculator. It should definitely have one integrated, though.

datadingo.bsky.social•84 days ago

It doesn't 100% know what a 3 digit number times another 3 digit number is. That is still a very stupid computer, though. And something we perfected at least as early as the 1930s...

pershing48.bsky.social•84 days ago

Hey let's not be too hard on the robot I'm not too sure of what that answer is either when I've been drinking heavily. Someone's kid at the pub was doing their math homework and I very confidently got 4x8 wrong

stlwonk.bsky.social•84 days ago

"Bender, I want you to look me in the eye and promise you won't get behind the wheel without some kind of alcoholic beverage in your hand"

starkrichdale.bsky.social•84 days ago

Damn it knows 13x13? Overachiever

slime.bsky.social•84 days ago

13 digits, not 13x13

kirsiaa.bsky.social•84 days ago

Give it flash cards to practice!

dp7ui.bsky.social•84 days ago

It gets it right if you give it access to a calculator, probably just like you.

svitjodviking.bsky.social•83 days ago

I'm about that good doing math off the top of my head.

robstewartwriterq.bsky.social•84 days ago

Yup. Doesn’t get talked about. Basically a five buck calculator does better.

investigationcone.bsky.social•84 days ago

They have the 97.5 % slots in green, but there's literally no application for this that would accept a 2.5% chance of it giving you the wrong answer, but it looks the same as a right one.
Having to check all the math is worse than just not using it. Same issue for any fact-based AI application.

pinh78.bsky.social•84 days ago

What are they attempting to accomplish? Im honestly curious. Is this a stepping stone stumbling to something amazing that will unlock new gene therapy or understanding of physics? There are some rare legit uses of AI mixed in with all the theft and art crap. See uses for unraveling genetic codes.

no1asseater.bsky.social•84 days ago

kids are going to ask AIs basic order of operations shit and be told to kill themselves

emilyrose.bsky.social•84 days ago

I’m reasonably confident that I could have written a program that would do multiplication with 100% accuracy well past 13 as a preteen in the 1990s. But I didn’t, because my computer had that built in, as did my calculator.

emilyrose.bsky.social•84 days ago

Also, getting a simple math fact correct 25% of the time is not doing “pretty well.” That is a failing grade. There’s no reason a computer should get simple arithmetic wrong ever, but that claim that it’s going well up to 13 x 13 is bizarre even apart from that.

slime.bsky.social•84 days ago

a 13 digit number x a 13 digit number

emilyrose.bsky.social•83 days ago

Oops, I missed that. Everything I said still stands, though. There’s no reason it should be getting any of this wrong.

blauerklabauter.bsky.social•83 days ago

Accuracy? Seriously?!

tinselfire.bsky.social•83 days ago

This thing is, without exaggeration, less reliable than a Playstation.

grhydian.bsky.social•83 days ago

They made such a shit calculator you can't even spell BOOBIES on it

winterpancake.bsky.social•84 days ago

And people tried to tell us it will replace all coding jobs

llammissar.bsky.social•84 days ago

Only blithering morons who don't know what engineering is actually about are saying that.

spaceprez.bsky.social•84 days ago

FOR REAL

Its so obvious to real engineers this can't code

I studied machine learning in college its cool tech but this is tech bros lying about what they can deliver in order to secure venture capital to then try to build the thing they lied about

Its all smoke and mirrors, its mostly useless

winterpancake.bsky.social•84 days ago

I feel that anyone serious about getting machines to code would start with something that understands the structure of code! Even if it were possible to train an LLM to learn that from sheer exposure, why? At least generate a parse tree 😩

spaceprez.bsky.social•84 days ago

That's why I keep telling people, we already had a better version of this. Its called IntelliSense and it does everything I need it to!

winterpancake.bsky.social•84 days ago

Of course. Anyone saying it should have been instantly discredited, but i still am surprised how much i saw the sentiment

llammissar.bsky.social•84 days ago

I place a good bit of the blame on the breathless marketing wank. I cannot properly encapsulate the revulsion I felt when that shill from Microsoft Research had the audacity to compare an LLM to AGI...

winterpancake.bsky.social•84 days ago

Yes. I find it hard to explain to non programming people just how unserious of an idea it is

winterpancake.bsky.social•84 days ago

Actually this chart helps a lot, though. Programming is somewhere between math and writing, and so if the bullshit machine can't do math........

mbletst.bsky.social•84 days ago

i mean you could probably eventually get a machine to just crib github all day

davidfink.bsky.social•84 days ago

I bet if you asked it to code a program in R for factor 12 it would do it….lets test

winterpancake.bsky.social•84 days ago

I had to try to get it to write a few classes for playing cards as an assignment, and i wanted to pull my hair out. It was easier to just write the code i wanted than to work out how to prompt it just right

davidfink.bsky.social•84 days ago

I’ve actually never used it for help with coding, but I’ve heard others say it is very useful for providing scaffolding for certain problems. Personally, I only use it to reduce word length, which it’s really good at. I need to try it with code sometime

winterpancake.bsky.social•84 days ago

Scaffolding like the general shape the code should take?

davidfink.bsky.social•84 days ago

Scaffolding might not have been the best word choice. More like it gets close but you need to finish it. Those whom I know use it almost like a first draft to build upon or if they are stuck on some very specific task. Bottom line, you need decent skills to use it as a tool, not end all be all

g--0.bsky.social•84 days ago

hey it's not all bad, their fracking investments are doing great

reddtrain.bsky.social•84 days ago

"I am o3-mini of Borg. Division is futile. Prepare to be approximated."

astreetrat.bsky.social•84 days ago

15x10 ... 50%...? jesus...

slime.bsky.social•84 days ago

do this in your head
198475849329453 x 1038473821

oscaropossum.bsky.social•84 days ago

I don’t have to do it in my head, I’m not a computer. Any computer built since 1980 could do that. My iPhone doesn’t have the screen space to display the answer so it rolls over into scientific notation.

oscaropossum.bsky.social•84 days ago

Oh nvm, they fixed that on the latest iOS update, it’s 206,111,973,629,377,344,749,913

slime.bsky.social•84 days ago

ancient computers could do that too. we have already built a calculator. this is something trying to solve these problems through reasoning, which is what makes it so neat

oscaropossum.bsky.social•84 days ago

Reason would tell it to use a computer, or write it out. Gimme another 13x13, I’ll do it by hand

oscaropossum.bsky.social•84 days ago

The correct answer to a math problem is never “just guess lmao” that’s how you fail 6th grade

slime.bsky.social•84 days ago

but that's not neat because you are not a robot

supuhstar.com•83 days ago

https://bsky.app/profile/supuhstar.com/post/3li2y4xrl2c2p

davidlanzrath.bsky.social•84 days ago

They’re going to resurrect askjeeves as GPT6

smullins3000.bsky.social•83 days ago

Is this really what "much better" looks when multiplying 6 by 7? 🤣

yucateco.sextape.mov•83 days ago

i think that means digits eg 1,000,000 x 100,000

smullins3000.bsky.social•82 days ago

Ah, I think you're right. Still, it only gets some of the "good ones" right 87.5% of the time?

technihilism.bsky.social•84 days ago

all that money just to make a digital failson

stinky-doggy.bsky.social•84 days ago

This is the best evidence you can show that “artificial intelligence” is a misnomer adopted for marketing purposes. It is just machine learning. ML is very useful when used properly and has contributed to major scientific advancements. But expecting it to operate as an AI is where we lost the plot

Comments

Posting Rules

Reply