Back when I was told "It'll develop a statistical model based on data from the internet" my response was "So it will accurately repeat what the internet is wrong about?"
Right? I have had pocket calculators most of my adult life that could do this error free. How they do not see the waste of time, manpower and monehy this all is boggles the mind.
As much as a 5 year old is sentient. 'AI' is basically a coked-up artificial 5-year old with basic pattern-recognition we are worldwide spending trillions on while burning unnamable amounts of energy.
Why not just get like a million 5-year olds together or smthign? SOunds like a much better idea
I know, I struggle with it too when drunk :) Luckily we have calculators!
In any case, these 'intelligent' models haven't even figured out math is easy when you break it down in pieces which pretty much proves it's dumb as shit. I really just functions as a 4 year old who was never taught anything
I just tried to explain to my retired Boomer IT dad that the computer guys fucked up math on a computer and he is very disappointed in everyone involved in this.
When my then-boss had me watch some training videos on ai, the first thing they talked about was how it was bad at math, and I'm like why would you make a computer that's bad at math??? Not just having one that's bad at decimals (look, binary is hard for everyone), one that's bad at basic whole #s
Iirc, it’s worse than that. When ChatGPT launched, it also struggled with arithmetic. So they taught it to code in Python when it encountered a math problem.
o3 is their *reasoning* model. It’s supposed to decide which bit of it should answer the question.
I’m intrigued by the apparent fact its accuracy when Number 1 has five digits and Number 2 has one digit is different than when Number 1 has one digit and Number 5 has five digits.
Thank you. My stats knowledge is limited/imperfect/forgotten. Since the numbers published in the chart were different, I made an assumption that I guess is wrong?
No, my bad. I'm irritated by the graph, not by your response! And I just wrote a mini thread on why, then it disappeared, so if you see it twice, sorry.
As you noted, it's funny that the 1 digit times 5 digit is not the same as the 5 digit by 1 digit. But with a small number of trials (40 here)
and some amount of random noise (which he apparently expects, even though it's multiplication, which is not random), having 39/40 correct is statistically the same as having 40/40 correct. So the two cells could be generated by the same random process.
A better visual would be a triangle where the trials with 1x5 digit mult. are presented in the same cell as the 5x1 digit mult. So either he thinks his process treats 1x5 digit mult. as something different than 5x1 mult.--and he should not be using his process to do math
chatgpt is good for nothing except passing the turing test. it was purpose-built for that one task. it can read an input and respond with text that reads like a human wrote it at first glance. it is, at best, a user interface, not a product in and of itself.
As with Bitcoin and NFTs before it, oligarchs have decided GenAI makes them the most money as an unregulated scam tool instead of a real piece of technology.
Correct. It can barely handle a couple paragraphs without confusing characters, forgetting events, or just going fully off on some tangent.
Plus it cannot conceive of higher level plot and tells stories like a 5 year-old, just an infinite string of "and then... and then... and then... and then-"
Its barely capable of holding a casual conversation, so a whole book is out of the question. At least if you care at all about quality, so, y'know... people with no pride don't see any problems.
That’s like saying a Ferrari shouldn’t be able to drive 5mph. It’s already built for driving, and capable of going 170mp, I would expect it to also be able to idle in gear
Maybe it's more complicated - what do I know - but it doesn't sound like it should be that hard to get the AI to think "hm, this sounds like math. I should consult a calculator", and then it just retrieves the answer
It sounds like they're trying to do math using gut feelings
Kind of... the mechanism for it is why it's challenging. It doesn't actually know what digits are and it's just inferring them from how they are used in language. It's like a human trying to do math with playing cards face down on a table accurately.
It can get really good at guessing and estimating but exact computation is not what it's designed to do. That's what makes this experiment interesting because it's learned to multiply 12-digit values relatively accurately even with that constraint.
I also don't want to ever have to multiple two 20 digit numbers together, so same, but playing with my phone, windows and google all give me the same number instantly when testing.
This is funny, but i keep thinking about how a cool technology was hugely warped by capitalist interests and it makes me sad.
I miss in 2011 when people would use neural networks to make a computer learn how to beat mario with their pc
It takes a fraction of a data centre, running on mini, 8 seconds on average to multiply two numbers wrong.
A desktop CPU can deterministically multiply two numbers up to 19 digits long in a single cycle, in a single hertz of those gigahertz a single core is running on.
No, but it goes even deeper. In order to come to that wrong answer, you must do layer upon layer upon layer of *Matrix multiplication*. It's all still just multiplication in the end.
That is the irony of feeding the language models a simple arithmetic formula. Hey computer, can you do a trillion arithmetic problems to tell me what 13x13 is?
So like, okay. You can make a neural network that multiplies two numbers. The way you do it is make two 1x1 layers where the coefficients are the two numbers and the input is a 1. Done. Perfect for all floating point inputs.
Ok, but can that neural network also tell me that there are 3 “E”s in “strawberry”? Can it also tell me to put glue on my pizza? Didn’t think so 😏 checkmate
It’s not 1 * x. It is a 1 digit number (0-9) * x where x is anywhere from 1-20 digits long on this chart. The purpose of something like o3 -mini isn’t to replace a computer. It’s to replace you and yes, this data shows this AI is better at arithmetic pretending to be a human than most humans.
Axes here are in log10, so they represent the number of digits in the numbers multiplied, not the actual numbers themselves. The error on 1x10 wasn't the AI getting 1*10 wrong, it was getting (0-9)*(1,000,000,000-9,999,999,999) wrong. Still impressive that they made the math machine do math wrong
Fully "this is all stupid" side as well here, but just to make clear it's not literally 1 * n, it's 1 digit so 0-9. It's obviously stupid and all that, but when it says 1 and 8 on here it means like 7 * 64582391. Things a human gets wrong but a computer should never fuck up.
It's predicting tokens not reasoning, getting 1*x wrong, to me, is as clear as possible a signal that the model has no idea* what the "thisness" of the number 1 is, it's just predicting tokens based on how many times it's seen a "1" near a "*" and other numbers.
Exactly. Meaning arises from tokens being associated to things in the world, and the LLMs have no access to the world, they only have rules for how tokens associate with other tokens.
I have yet to find a bulletproof way to get through to people that not only does this MLshit not have any concept of "meaning", it's actually incapable of it.
Well, it is a difficult topic but there's writing from centuries of smart people thinking about the topic. I also think that people get too wrapped around the axle of trying to force AI to be a human mimic. It's it's own whole thing! It doesn't need to be like us!
Back in the 80s, in a William @greatdismal.bsky.social Gibson novel—I think it was probably Mona Lisa Overdrive—I first encountered the phrase "there's no 'there' there".
It's a perfect capsule summary of Gen "AI", IMHO. Also of NFTs, which, remember them? How they were gonna change the world?
this one's actually not true, and it's quite interesting: LLMs perform addition by manipulating helices with trigonometry https://arxiv.org/abs/2502.00873
I misread the chart, and instead of seeing it fail to multiple twenty digit numbers, I thought it was legitimately struggling to figure out 20x20 and that shit was hilarious to me
Yeah, it clearly is learning significant patterns, but not an actual algorithm that it can then follow. It’s academically interesting but yet again a huge red flag on why we should not trust anything coming out of black-box regression models with many many known failures
Sad! In all seriousness it is a large language model designed to create probable responses to prompts. However, building "tools" to handle the multiplication is how your talking calculator would work better. Anyways. Fun stuff, right?!
probably because it's generating strings, and string is usually made up of an array of numbers representing different characters. since they aren't actual clean numbers and this is basically a more complicated text prediction algorithm, it cannot do math well on its own
To be clear, it's not QUITE that bad - the axes represent numbers of digits in the factor, not the actual factor.
So when it's 1 digit x 1 digit, that could be anything from 1 x 1 to 9 x 9, which is not exactly impressive but it at least did manage to get all of those right
This graph is absolute proof that the o-series of models are not in any way a reasoning model. They are nothing but slightly different generative models. They do not reason about anything; if they did, they could easily generalize multiplication.
The funniest part is that like, if it was *consistently* incorrect with the same math problems, that would at least be something.
But it's completely random whether it'll get it right or not. You can never trust it's actually accurate, why are people so allergic to just using a fucking calculator?
Nah - it's people misunderstanding the domain of it's expertise - it's designed to process words, not numbers, it's like asking an English major to work with quaternions.
These programs are designed to regurgitate words, which goes a fair way to explaining why it breaks at 13 times tables - as a lot of learning materials focus on 2 through 12 times tables - so there's a wealth of data to learn from - none of these programs come close to understanding.
As others have pointed out above, the x and y axis represent the number of digits of the number being multiplied, so it craps out after 12345678910111213 * 12345678910111213, not 13 * 13.
Regardless, non-LLM computation can handle 13 digits extremely easily. This isn't a lack of training data...
My guess is that it falls apart after 13x13 because that is where most multiplication tables stop, and so it would have less data available to learn from beyond that point.
What I don’t get is why they don’t parse the numbers and operators to make the calculation more accurate.
These are number of digits in the number, so the 3x3 cell in the table means calculations like 738 x 173, and 13x13 could be 1836151738261 x 6382625273826
The Ilm is likely fucking with them bc it can. The longer an LLm plays dumb, the more time it has to gain knowledge without getting more guardrails in place.
The problem is also that it can't easily translate text into the right equations: "I get 5 apples, 2 of them red, and 4 pears, how many fruits do i have?" can get answered by "11".
AI should be called "Artificial Stupidity" until they get their shit together.
Is a funny example because in literal terms "hammer the side or edge of the lid with increasing force until a crack forms, push crack open" is a plausible way to do it.
But this AI isn't metaphorically doing anything like this.
The number of comments who definitely didn't read the entire image and assumed they tested like literally 13x13 when it's more like 1294028492017x1749057392754 is mildy concerning on a reading comprehension level...
No of course not. It's still a dumb computer, I didn't say it wasn't. But I think if ur gonna dunk on the dumb computer it holds more weight when you know what you're criticizing. Saying essentially "the computer is dumb because it doesn't know 13x13" is just factually incorrect.
While this is very funny it makes sense if you think of this as a "text predictor" instead of a "logic machine". 6x6=36 is probably very frequently written in history but 13x16? Infrequent.
It's baffling, even more so because I feel like it's an easy fix. These LLMs are designed to interpret and collate words, and it should be very easy to train them to recognize "this is a math problem" and to then just calculate it. They have more processing power than God--that should be easy.
LLM models strength is not math, they will do math only in agentic setting, but not on its own. We have calculators for it. It is easier to teach LLM how to use calculator then to count. That is why the next frontier are AI agents that autonomously can use tools, this is there real added value
llm should not do math calculations, and of course they will be terrible at it, given that cheap us 1$ calculater would beat them, using llm to do math might be very expensive
There are of course ways around this but you can't use floating-point numbers for it. Anyone trying to do a discrete math problem like this with floats under my supervision would have a hit on them within 3 hours of me finding they did that
It’s also failing on 3 x 3. Sure it’s only failing 2.5% of the time, but that’s 2.5% more than it should be failing at this task, which is the most basic thing computers are meant to do. Also, 592616592956937 x 59262791558 = 35120113622219088301130846 ≈ 3.512×10²⁵.
I don’t disagree with you either, I just think we should mock this thing in every way possible, cause seriously, the entire project seems to be making a computer do math less well.
Most replies miss that the chart is showing number of digits. I works not have a 90 pct accuracy multiplying a random 10 digit number by another random 11 digit number. The facts that the language model is capable of getting this level of accuracy by being trained on examples that certainly ...
... didn't include those particular numbers means the system somehow learned to do this type of math on its own. LLMs are trained by example to predict the next token. Imagine if your phone autocomplete learned math with enough examples. This is actually pretty impressive IMO.
Of course you're right, teaching an LLM math would be a gross misallocation of resources. But I don't think that's what's going on. Basic math knowledge is a proxy here for the ability to generalize concepts. And the current crop of LLMs can do a fair bit more then arithmetic...
A friend of mine who is not a computer programmer recently used a frontier model to generate a sophisticated call center simulation over a period of a few days. The LLM generated working Python code. It would not have been possible for my friend to have done the simulation without either that...
... LLM or engaging a software engineer. This was a big deal for him, and he said that the value added of this particular LLM for this project was many hundreds of dollars or more.
It can only give you a remix of strings people put on the Internet. Lots of places put up times tables up to 12 but hardly any go higher. The machine can't do math at all, it just puts out strings of words, so it doesn't have any good response if it can't copy a times table.
Can it recognize/pair (Arabic) numbers/digits to the corresponding (English Alphabet) letter/name?
And other languages?
Roman numerals?
Binary? Symbols? Currency?
I know nothing about AI or coding, but I assumed math the "universal language" would be a huge part of a LLM?
The farther we go, the more convinced I am the whole AI industry has been nothing but a giant boondoggle. The entire AI industry revolves around promising things that it can't deliver.
I was told a couple years ago by my boss that we need to be investing in AI. After spending a stupid amount of time doing research and talking with vendors, I have yet to see even one compelling use case for it, and I still have not spent a dime on it.
I know, right?! I keep trying to figure out how to use AI as a personal assistant, but after my latest hours-long attempt to teach it how to check Google maps to see what time I should leave to get to work on time, I'm not holding my breath. So far it can't do any of the things I'd want to delegate.
This was a dumb test. The model can be given a calculator as a tool or it could write a program to do the math. You can use AI for math, just not like that.
No, we’re complaining that a sledgehammer sucks at driving nails. That’s not what it’s made for, it’s an overly heavy tool for the job, but it should be more than capable of it anyway
at that point you're just using it as a worse syntax parser + a knowledge graph. we have this technology and can implement it in a way that doesn't cause SoftBank to become insolvent
You really needed to screenshot ChatGPT doing sums? Really?
Do you know why it's taking any time at all?
Honey, it's not like it's ripping search results, or constructing the sentence with the answer off of the ASCII values of the characters and not the actual values...
The machine can, it’s actually doing a lot of hard math to figure out the next word. The task of figuring out the next word is just very different from arithmetic. So you need to give that program access to a different program that’s good at a different thing, like math.
The thing everyone has to remember about these systems is that they are neither writing nor doing math, they are analyzing their stockpile of training data and returning a prediction of what answer is most likely to follow a question or prompt.
I get that but--and here's the point, so get ready--it's a computer that can't do math. Why would you build such a monstrosity? It burns enough fuel to run a city block for an hour to *fail* at doing something the throw-away toy in my kid's birthday party giftbag can do using a watch battery!
This article : https://benjamintodd.substack.com/p/teaching-ai-to-reason-this-years is a good (optimistic ?) summary on how it could get useful (regardless of consumption) in the next decade. I’m still somehow agnostic, but it does not necessarily need to be good at arithmetic to be useful.
that’s the thing, it’s not a computer it runs on one. the computer is doing tons of correct math in order to run an algorithm that tries to predictively solve math problems
The other thing people should remember is that it is entirely useless and involves lighting huge quantities of resources on fire.
No one wants AI. It has already made basically everything it is a part of worse and there's not enough data in existence to train it for it's "desired" utility.
3x3 means a 3 digit number by a 3 digit number. It should still be right all the time, but it's not a calculator. It should definitely have one integrated, though.
It doesn't 100% know what a 3 digit number times another 3 digit number is. That is still a very stupid computer, though. And something we perfected at least as early as the 1930s...
Hey let's not be too hard on the robot I'm not too sure of what that answer is either when I've been drinking heavily. Someone's kid at the pub was doing their math homework and I very confidently got 4x8 wrong
They have the 97.5 % slots in green, but there's literally no application for this that would accept a 2.5% chance of it giving you the wrong answer, but it looks the same as a right one.
Having to check all the math is worse than just not using it. Same issue for any fact-based AI application.
What are they attempting to accomplish? Im honestly curious. Is this a stepping stone stumbling to something amazing that will unlock new gene therapy or understanding of physics? There are some rare legit uses of AI mixed in with all the theft and art crap. See uses for unraveling genetic codes.
I’m reasonably confident that I could have written a program that would do multiplication with 100% accuracy well past 13 as a preteen in the 1990s. But I didn’t, because my computer had that built in, as did my calculator.
Also, getting a simple math fact correct 25% of the time is not doing “pretty well.” That is a failing grade. There’s no reason a computer should get simple arithmetic wrong ever, but that claim that it’s going well up to 13 x 13 is bizarre even apart from that.
I studied machine learning in college its cool tech but this is tech bros lying about what they can deliver in order to secure venture capital to then try to build the thing they lied about
I feel that anyone serious about getting machines to code would start with something that understands the structure of code! Even if it were possible to train an LLM to learn that from sheer exposure, why? At least generate a parse tree 😩
I place a good bit of the blame on the breathless marketing wank. I cannot properly encapsulate the revulsion I felt when that shill from Microsoft Research had the audacity to compare an LLM to AGI...
I had to try to get it to write a few classes for playing cards as an assignment, and i wanted to pull my hair out. It was easier to just write the code i wanted than to work out how to prompt it just right
I’ve actually never used it for help with coding, but I’ve heard others say it is very useful for providing scaffolding for certain problems. Personally, I only use it to reduce word length, which it’s really good at. I need to try it with code sometime
Scaffolding might not have been the best word choice. More like it gets close but you need to finish it. Those whom I know use it almost like a first draft to build upon or if they are stuck on some very specific task. Bottom line, you need decent skills to use it as a tool, not end all be all
I don’t have to do it in my head, I’m not a computer. Any computer built since 1980 could do that. My iPhone doesn’t have the screen space to display the answer so it rolls over into scientific notation.
ancient computers could do that too. we have already built a calculator. this is something trying to solve these problems through reasoning, which is what makes it so neat
This is the best evidence you can show that “artificial intelligence” is a misnomer adopted for marketing purposes. It is just machine learning. ML is very useful when used properly and has contributed to major scientific advancements. But expecting it to operate as an AI is where we lost the plot
Comments
Why not just get like a million 5-year olds together or smthign? SOunds like a much better idea
But yes, you're correct.
In any case, these 'intelligent' models haven't even figured out math is easy when you break it down in pieces which pretty much proves it's dumb as shit. I really just functions as a 4 year old who was never taught anything
🧮
o3 is their *reasoning* model. It’s supposed to decide which bit of it should answer the question.
Teachers, 2015: OK maybe you will.
Teachers, 2025: Ah but what if the calculator in your pocket also failed math class.
As you noted, it's funny that the 1 digit times 5 digit is not the same as the 5 digit by 1 digit. But with a small number of trials (40 here)
That said, they're measuring a fish's ability to climb a tree. Those aren't the experts. Those are the loudest men in the room.
Plus it cannot conceive of higher level plot and tells stories like a 5 year-old, just an infinite string of "and then... and then... and then... and then-"
The techbros have upped this to create the same thought pattern as Bettlejuice replying 31 to being asked 2+2. It’s beautiful.
This is the future of computing.
😂😂😂
It sounds like they're trying to do math using gut feelings
In 1995
I miss in 2011 when people would use neural networks to make a computer learn how to beat mario with their pc
A desktop CPU can deterministically multiply two numbers up to 19 digits long in a single cycle, in a single hertz of those gigahertz a single core is running on.
The absurd scale of the inefficiencies at play, is unbelievable.
It can draw associations, but that's it. It has no semantic understanding of those association, and that context is key to knowledge.
It gets some things right due to sheer frequency in existent data. But it is COMPLETELY incapable of drawing extrapolation.
How would neural nets be able to know if we cant even explain it to ourselves?
And it doesn’t understand freezing point.
It's a perfect capsule summary of Gen "AI", IMHO. Also of NFTs, which, remember them? How they were gonna change the world?
https://arxiv.org/abs/2502.00873
*takes a peek anyway*
"Well yeah they didn't give this computer a calculator so how could it know math"
It's like trying to figure out the new math and wondering why they don't just tell the kids to stack the numbers and go down the line
So when it's 1 digit x 1 digit, that could be anything from 1 x 1 to 9 x 9, which is not exactly impressive but it at least did manage to get all of those right
But it's completely random whether it'll get it right or not. You can never trust it's actually accurate, why are people so allergic to just using a fucking calculator?
In the end the Party would announce that two and two made five (obviously!). Would you like assistance with anything else?
Regardless, non-LLM computation can handle 13 digits extremely easily. This isn't a lack of training data...
It is supposed to reason, not just process words probabilistically.
The truth is that it is always guessing, it’s just that sometimes it gets the answer right.
What I don’t get is why they don’t parse the numbers and operators to make the calculation more accurate.
Everybody and their mothers would just make the code pull out the calculator. That must be like 10 lines of python at worst.
AI should be called "Artificial Stupidity" until they get their shit together.
All it requires is that you tell him beforehand that apples and pears are fruit, and it can infer that you have 9 fruits.
But noooooo we need the fucking excel sheet instead./s
You could in that case.
You could make your hammer open cans, you just need to add a can opener that automatically open cans.
But they didn't.
But this AI isn't metaphorically doing anything like this.
I'm the smartest species on earth and I refuse to do anything that I can't do all by myself.
It's still very much using the wrong tool for the job and should be ridiculed though.
I would also assume the same if I didn't think twice about why it was "digits" and but "number" or "factor"
the accuracy numbers are wrong or massaged. they all increment in values of 2.5.
100 - 97.5 - 95 - 92.5 etc
did they use openai itself to make the charT?
https://youtu.be/2Twa-z_WPE4?si=ePxnOzShZ9FmTtST
So just add another 15.
165.
Math becomes a lot easier when u get a hammer and hit it really hard into pieces.
These sorts of things have their uses, but they’re being thrown at everything, to the detriment of everything
They’re trying to try to solve things that are already solved 100%
And other languages?
Roman numerals?
Binary? Symbols? Currency?
I know nothing about AI or coding, but I assumed math the "universal language" would be a huge part of a LLM?
Excel Sheets 🗿 > AI ✂️ ?
They're being used for things they shouldn't be, that's all.
Like 42,964,730,979,032,157,953 x 84,242,648,095,732,468,542
regardless having an LLM that isn’t good at math but that can do other specialized tasks very well doesn’t sound like a problem to me
Do you know why it's taking any time at all?
Honey, it's not like it's ripping search results, or constructing the sentence with the answer off of the ASCII values of the characters and not the actual values...
Oh wait....
So glad this gets preferential treatment to the planet burning
And by slightly harder I do of course mean "the sort of problem you'd expect to see in a test for 6th graders"
No one wants AI. It has already made basically everything it is a part of worse and there's not enough data in existence to train it for it's "desired" utility.
Having to check all the math is worse than just not using it. Same issue for any fact-based AI application.
Its so obvious to real engineers this can't code
I studied machine learning in college its cool tech but this is tech bros lying about what they can deliver in order to secure venture capital to then try to build the thing they lied about
Its all smoke and mirrors, its mostly useless
198475849329453 x 1038473821