It was either on here or on Twitter. I still don’t know how it’s possible that ChatGPT can get sums wrong. - ThreadSky

soozuk.bsky.social • 8 days ago

It was either on here or on Twitter. I still don’t know how it’s possible that ChatGPT can get sums wrong.

Comments

Because LLMs are large LANGUAGE models. They are not designed to take a numerical input or perform numerical calculations.

They take an input sentence, turn each complete word into a numerical token, and do matrix multiplication on those numbers using pre-trained weights.

It’s not a calculator.

joshuatanzer.bsky.social•7 days ago

I've studied AI programming a little bit, and I can explain it. What AI basically does is, it looks at correct answers that it can learn from, and then it uses that learning to create new content that RESEMBLES what it has seen before. But that need not be correct. It only has to have the same form.

jeffna.bsky.social•8 days ago

ChatGPT / LLMs are things that give you a response in the shape of an answer. It’s not using logic to work anything out. It’s essentially autocomplete, but at a much larger scale.

It cannot do the thing we use most computer software for (working through processes with maths reliably & accurately).

jeffna.bsky.social•8 days ago

(I hope that was ok, apologies if I’m being one of those guys)

casslowe.bsky.social•7 days ago

if you ask it 'is today Easter sunday' it can't even get that right, its mental

thirdmanuk.bsky.social•8 days ago

AFAIK it's because LLMs convert everything to words, so it doesn't actually do the maths, but searches its library for the most likely answer. And given how often people are wrong online, it's working with bad data.

AI basically stops computers doing the one thing they're good at.

caomunista.bsky.social•7 days ago

That would break it, its an autocomplete on steroids, adding layers to avoid admitting to copyright infringment on top of it, has already made it much less efficient at its supposed job.

roq11.bsky.social•8 days ago

I tried to use it for simple language learning and it got super basic grammar wrong. It also mixed up several Romance languages and had a hard time admitting to being wrong.

kahbn.bsky.social•7 days ago

Because it's NOT doing math. It's doing its best impression of someone answering a math question.

khaki-bear.net•7 days ago

You're overestimating what an llm and what it does. Stop treating these things like people or minds.

essaywells.bsky.social•8 days ago

It can't do math because it is a word manipulation toy, it has no concept of numbers.

soozuk.bsky.social•8 days ago

Just put a calculator in it. My phone has one so I know it’s possible.

publiusmaximus.bsky.social•7 days ago

I had been told that they DID put a calculator in chatGPT. and it's STILL getting math wrong?
Incredible!

quiggy.itch.io•7 days ago

one of the fundamental engineering challenges of these things is that there's no clear on ramp or off ramp for specialized logic. every attempt to bolt another non-llm feature onto the side of the llm requires you to try to understand what's happening in the conversation

zero0hero.bsky.social•7 days ago

So basically it's hit or miss that it makes the correct API call?

quiggy.itch.io•6 days ago

basically, yeah.

sadmazov.bsky.social•8 days ago

It's because it's just fundamentally not possible, they haven't written code to make chatgpt, they've just shovelled data into a black box that spits out words. They can't just "fix" the code because they didn't actually write it.

sadmazov.bsky.social•8 days ago

This is the best explanation of how they LLMs work
https://bsky.app/profile/posistress.bsky.social/post/3lnqi4mueh22n

kingbadger.bsky.social•8 days ago

Exactly. It's basically one of those chat bots from the early 00s with access to the internet and a library of OG works. It's function is to give you an answer, whether it makes sense or not.

zero0hero.bsky.social•7 days ago

One of the dangerous things about LLMs is that they have quality of almost always arriving at an 'answer'.

And this answer will almost always be 'shaped' like coherent language.

This makes them potentially much more dangerous, in a stupid way, then a computer just erroring out.

sadmazov.bsky.social•7 days ago

Yeah there was some legal case were the submitted docs had perfectly formated citations to fake cases

greybunny.bsky.social•8 days ago

This is also, I think, why so many people working on LLMs at places like OpenAI and Anthropic have gotten weirdly religious about it. Coders already tend to be superstitious, but working constantly around this black box they can't comprehend the workings of? That's gonna twist your mind Ito style.

sadmazov.bsky.social•8 days ago

Yeah they don't understand the thing they're working on because it's just a black box, so all they can do is tech priest rituals. "Did this actually we did actually improve the LLM or was it just random chance that this hand we generated had the correct number of finger?"

zero0hero.bsky.social•7 days ago

ChatGPT IIRC, does actually have a plug-in for Wolfram Alpha.

Which means the LLM failed to properly parse that it should use the knowledge engine.

david.legros.me•8 days ago

In fact chatgpt is more a text generator, they can make an agent specialized for it but they have priorities with starters packs pictures. Have you tried on Claude witch is more accurate on such things ?

maxthyme.bsky.social•8 days ago

More accurate?

MORE ACCURATE?

THE ONLY ACCEPTABLE LEVEL OF ACCURACY FOR THE MATH MACHINE WHEN DOING MATH IS 100% ACCURACY!

david.legros.me•8 days ago

use a calculator it's made for that.

It's the same than using a Ferrari to deliver parcel, it's a good car but inefficient for that.
Search how itterative models works and you will understand why.

david.legros.me•8 days ago

in fact claude is not really more accurate but he is able to put the calc in a python script and lauch it to solve.

And python can be use like a calculator.

maxthyme.bsky.social•7 days ago

Know what would be way cooler and more fun than asking a liar machine to do that?

Learning how to use a little Python and opening a terminal and putting your problem in so you can actually gain some understanding and insight of what is going on.
https://www.programiz.com/python-programming/examples/calculator

jaloopa.bsky.social•7 days ago

He? It's an overcomplicated autocorrect, not a person with a gender

jeffna.bsky.social•8 days ago

A Ferrari will still deliver the parcel, rather than miss your address by a city or somehow turn your parcel into a cactus.

leftbower.bsky.social•8 days ago

Okay, but in this analogy LLMs are the Cybertruck. A big stupid status symbol that marks you out as being a big stupid rube with no taste or understanding who is tricked out of money by grifters.

leftbower.bsky.social•8 days ago

It can for short periods of time get you places badly, but will break down quickly off-road despite having been sold as a rugged offroader.

lexieinholland.bsky.social•8 days ago

LOL no just no. A system presented as intellectually superior to humans should be able to make calculations. Humans can make calculations without calculators, you know that right?

david.legros.me•7 days ago

Theres no human in LLM it's seems to be human but it's not.

Like all tools it's specialized

edoyl.bsky.social•8 days ago

It has one! ChatGPT can run limited python and see the output, and 123*456 is a valid python program. So it's a calculator that has a calculator and STILL can't do math.

caradelaney.bsky.social•8 days ago

We have calculators. And so many websites to do math. I use them quite frequently because some of that shit is just a big nope for me.

They all have the gigantic advantage of NOT slurping up grotesque amounts of electricity and water, so why would we put them inside a framework that does do that?

davidhamilton.bsky.social•8 days ago

We've had Wolfram Alpha for 15 years now that works great with natural language

kbenjaminstone.bsky.social•7 days ago

That’s fantastic, it’s the 1 that shows all the steps, right? Helped me pass college algebra because of that (my main weakness academically is higher math).

And not because I cheated! It would help get through the steps when I was stuck in spots.

I can write my ass off though, don’t need ChatGPT.

evandrofisi.co•7 days ago

Yep, but as it is not "AI" people won't use it, because the only kind they know are the god-damned LLMs generating bullshit

durrellb.bsky.social•8 days ago

I think the point is that calculations are literally the only thing computers do. It's how they work at their very core. They should not be doing them wrong, especially the supposedly fancy new super computer nonsense.

flisty.bsky.social•8 days ago

The GenAI does word calculations. It does not read the words, it just does a statistical analysis of how often they are close to other words. So it is doing the same thing to the numbers. It doesn't recognise them as numbers in the traditional sense, and isn't trained to apply maths to them.

soozuk.bsky.social•8 days ago

So it gets a fucking sum right obviously, come on.

caradelaney.bsky.social•8 days ago

My point is that we HAVE plenty of online interfaces that will "get a fucking sum right", and that use far fewer resources to do it while being much more reliable.

The lying theft machine does NOT need that feature, which will do the exact same thing but cost 10x more per calculation.

emmatonkin.bsky.social•7 days ago

One mildly depressing thing is that one could easily build a low cost local voice assistant that could pretty reliably assemble any sum you dictated to it, chuck it at a calculator, and give you the answer, yet I suspect most people wouldn't like it - not chatty/friendly/sycophantic enough.

lusitanianluser.bsky.social•7 days ago

we have that, it's called Wolfram Alpha. Well, had it. even they have started getting into AI shit

emmatonkin.bsky.social•7 days ago

Genuinely get the impression that the lure of the LLM is at least in part one of those things that boils down to The Psychology Of The Individual, possibly sometimes in slightly hair-raising ways.

quokka.wiki•8 days ago

If I remember correctly, ChatGPT has access to a Python playground where it can check the answer, so they basically did put a calculator in it. I think it's called Advanced Data Analysis mode. (ChatGPT is still shit but it can check. It still doesn't do it nearly often enough)

benbartlett.bsky.social•8 days ago

Honestly my guess would be that since LLMs don't 'understand' anything that's asked if them, they can only infer relationships to other words in their database, they haven't worked out how to recognise when the LLM is being asked a maths question and let the calculator take the wheel.

beneprism.rip•7 days ago

ChatGPT Pro does this today

3firsts.bsky.social•7 days ago

They do put calculators in it. Problem is you have an unreliable mediator (the LLM itself) inserted between the you and the calculator. If you can't rely on the LLM 100% for non-calculator tasks, you also can't rely on it 100% to put your question into the calculator and return the answer

danehoffman.bsky.social•7 days ago

There is no technology you can (or should) rely 100% on.

3firsts.bsky.social•7 days ago

You can rely on a calculator 100%

danehoffman.bsky.social•7 days ago

Floating point errors: “Allow us to introduce ourselves”

3firsts.bsky.social•7 days ago

its like if you have an employee that has access to a calculator and you ask them to do some math for you, but you know the employee is generally incompetent. sometimes they might still find a way to mess it up.

arithmetc.bsky.social•7 days ago

You have to be kidding me.

flisty.bsky.social•8 days ago

Equally, why use ChatGPT if you have a calculator

nategrier.bsky.social•7 days ago

Because people are being told by their bosses and by ads for Gen AI that it CAN in fact do calculations correctly. It is a one stop shop for all possible use cases, if you can't figure out the prompt, that's a you problem. So they do, and when they say it's wrong they're told to check their prompt

flisty.bsky.social•7 days ago

Yeah I know, it's nonsense. Saw a thread yesterday (I may be conflating two) about how it's being marketed as a tool for which we must find a use - but that means it's not a tool doesn't it. Tools are designed with a use in mind.

flisty.bsky.social•7 days ago

They're using the same strategy as car marketers - use this for *everything* even when it's not suitable and uses way more energy than you need. It's society's fault if you can't park! It's society's fault if there's traffic!
... And it's worked for them, so I'm scared it's working for this too.

flisty.bsky.social•7 days ago

I think AI is *so* unreliable that it can't even be brute-forced in that way though.

ambertiff.bsky.social•7 days ago

https://bsky.app/profile/ambertiff.bsky.social/post/3lnzk2e4s2s2c

apnudd.bsky.social•8 days ago

IIRC it's because LLMs don't know how to sort numbers into patterns, only words. They're also dumber than a slug.

johnheyderman.bsky.social•7 days ago

There are alternative results

arithmetc.bsky.social•7 days ago

Then you need to learn about the technology before you try to form good critiques. Because it is a large language model. NO ONE would tell you that makes it good at mathematics.

lapideus.bsky.social•7 days ago

and yet an awful lot of people are telling me that it's somehow going to evolve into an all-knowing general intelligence that one would certainly presume would be good at mathematics

arithmetc.bsky.social•7 days ago

Yes and they are fucking idiots

lapideus.bsky.social•7 days ago

yeah, well, idiots are still people, so it's not true that "no one" would claim that! lots of people claim that!

no one *who knows what they're talking about* would, but that's an entirely different sentence!

arithmetc.bsky.social•7 days ago

I think what I mean is we shouldn’t waste too much time arguing against those people, they really aren’t the strategic dangerous ones here, just the grifters going with the flow

arithmetc.bsky.social•7 days ago

Ah apologies, that is a good correction!

dong.social•8 days ago

the big giant one you can only use occasionally (unless you shell out $200/mo) writes code to calculate things

renrut-mas.bsky.social•8 days ago

There's a looong answer to this involving the difference between the pseudo eigenvectors in a massive data set and actual knowledge if you fancy being bored into genuine tears by linear algebra

TL;DR - LLMs 'understand' literally nothing and never will, ironically because of maths

thecakeisnotalie.bsky.social•7 days ago

craiggrannell.bsky.social•8 days ago

LLMs have no real concept of numbers. They can deal in abstracts (longer; shorter) but ask one to write 250 words and it will faceplant. So much of it is little more than fancy autocomplete based on probability driven by eating the entire internet. A souped up version of the keyboard row on a phone.

ketanjoshi.co•7 days ago

I wrote a thing about it!

https://bsky.app/profile/ketanjoshi.co/post/3lnixuywpws2v

tinydbass.bsky.social•7 days ago

I want to know if you can use it to spell rude words if you turn it upside down.

alpenm.bsky.social•8 days ago

It's not actually "thinking" in that sense. It's guessing the next best word/characters based on what it was trained on.
If it's seen the calculation you want lots of times in it's training, it will most likely get it right... If it's not seen it before, then most likely wrong.

soozuk.bsky.social•8 days ago

https://bsky.app/profile/soozuk.bsky.social/post/3lnz7tmzgps2m

alpenm.bsky.social•8 days ago

Maybe a more complicated one could recognise that it's a calculation... Read it in and hand it over to a dedicated program...
But I don't think that is happening, yet.

craiggrannell.bsky.social•8 days ago

Which in itself showcases how little LLM creators are thinking. They should be able to hand off + reintegrate components. Then again, AI is now so big that the once greatest search engine is determined to ruin itself with AI-gen results that are often comically wrong and sometimes dangerously so.

dimitrifrompowys.bsky.social•8 days ago

Gemini does that on the phones, it will hand off the data to the clock app or whatever. I guess for it to be useful the calculator would need to hand back the data afterwards.

alpenm.bsky.social•8 days ago

It doesn't always though. I do a lot of decimal to hex conversions. I used to just Google them because it had a little plugin that did the conversion. Phrase the conversion query wrong now and you get the AI *answer*, which is always wrong and not even stable. If I'm lucky I get the old plugin.

craiggrannell.bsky.social•8 days ago

One of the things I think probably surprises people is the progress in these things is not linear. I’ve done a lot of experimenting with style and voice in LLMs and that is absurdly inconsistent. So complexity is being added by needing to target a specific version – if that’s even possible.

dimitrifrompowys.bsky.social•8 days ago

Ah I meant talking to your phone, so it hands off to other system apps if it's been allowed to. Know exactly what you mean about new Google. Totally unpredictable, i often ask it time related questions and it's gone from 100% correct to 75%.

Comments

Posting Rules

Reply