A few people have responded to this paper by saying "well people freaked out about calculators too". It's a funny comparison because people *are* using chatbots as calculators - and they suck. This tested the accuracy of an LLM to perform simple multiplication xcancel.com/yuntiandeng/... - ThreadSky | a Reddit-style client for Bluesky

ketanjoshi.co • 4 days ago

A few people have responded to this paper by saying "well people freaked out about calculators too".

It's a funny comparison because people *are* using chatbots as calculators - and they suck. This tested the accuracy of an LLM to perform simple multiplication

https://xcancel.com/yuntiandeng/status/1889704768135905332/photo/1

Comments

chuckaholic.bsky.social•4 days ago

Imagine asking an LLM to do math when Wolfram Alpha has existed for YEARS. I was trying to get my local Llama to use Wolfram but my Python wasn't strong enough. I should have gotten Claude to code it for me. OOoo! Can I run a local instance of Wolfram Alpha?

pieterpeach.com•4 days ago

No, but there is an MCP server to connect it to WA https://github.com/SecretiveShell/MCP-wolfram-alpha

chuckaholic.bsky.social•4 days ago

This is nice because it can run as a service, separate from the LLM.

runefar.bsky.social•4 days ago

Chatgpt actially combines with wolfram alpha now but this was done for a specific reason which is basically testing the implicit ability of language model. This is actually relevent to how we understand connectionist models

https://arxiv.org/html/2405.14838v1

jasonnlewis.bsky.social•4 days ago

“What if your calculator was wrong, and you were too dumb to know it?” 🤔

runefar.bsky.social•4 days ago

https://arxiv.org/html/2405.14838v1

Is one of the papers the researchsr was working on in relation to this concept

runefar.bsky.social•4 days ago

Like compare his description of above to https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/

For example. Yet researchers also want to analyze different methods at solving this problem so that it could interect with different systems and also so we can understand it and perhaps ourselves better too

runefar.bsky.social•4 days ago

This is important because it relates to how we learn math ourselves and exploring that through a connectionist model(and potentially questions of symbolism versus connectionism) as well as how we can then from there retrain even a language model to directly understand math through filters and weight

runefar.bsky.social•4 days ago

The thing is Ketan is just misinformed here. When using chatGPT for calculation, you are more often plugging it into an alternative calculation database such as wolfram alpha. The researchers here were specifically trying to explore how well a language connectionist model did on its own

alxrdk.bsky.social•4 days ago

The mere fact that they use "accuracy in %" for a correct/incorrect task is quite something. I would argue an accuracy of 90% is a lot worse than one of 0%?

clintoncoker.bsky.social•4 days ago

I wouldn't. Something that is 100% accurate is right all the time.

I can't even tell what the chart is actually measuring but I presume it's more of a histogram than a measure of proportional error.

runefar.bsky.social•4 days ago

Though it is more like a histogram of long division in a sense. Basically immagine you are doing math similar to how newton would do it. At what point do you start to have errors

runefar.bsky.social•4 days ago

That is basically what it is yeah.

runefar.bsky.social•4 days ago

It is because they can see the reasoning and where it goes wrong. Imagine it more like you were doing long division and where error points are

runefar.bsky.social•4 days ago

https://arxiv.org/html/2405.14838v1

ketanjoshi.co•4 days ago

right????

theodora.bsky.social•4 days ago

Gemini's list of 5k paces was confidently wrong.

runefar.bsky.social•4 days ago

They also point out that newer models dont appear to be making these kind of mistakes as commonly which is why they expressely tested it on the oldest model

runefar.bsky.social•4 days ago

It isnt really simple multiplication for it though. In essence it is doing a process much more equivalent to long division.

runefar.bsky.social•4 days ago

If anyone is curious this is one of the paper the researcher was working on in relation to this concept

https://arxiv.org/html/2405.14838v1

runefar.bsky.social•4 days ago

The issue though with your coverage in your last paper though is that it simply misunderstands some simple aspects about learning too though. Over reliance on note taking has a similar effect. Offloading is a description not of a inherentily negative process but how learn yet it can be overrelied on

runefar.bsky.social•4 days ago

Also it isnt surprising that a LLM doesnt do math well. In fact it is actually interesting for many AI researcher instead because of how this compares with how humans begin to learn math over how computers represent math. For researchers this is representing math within a connectionist model not bad

runefar.bsky.social•4 days ago

This is true with AI and why we should improve our teaching of how to interect with AI when it is surface level AI. At the same time the truth though is a lot of AI womt be surface level AI and more similar to speech to text

https://ercim-news.ercim.eu/en136/special/chatbots-socrates-dialogues-in-learning

https://link.springer.com/chapter/10.1007/978-3-031-75599-6_1#Abs1

runefar.bsky.social•4 days ago

Also most people who use chatbots as calculators arent using it in the way the researchers are. They are plugging chatGPT into wolfram alpha database which is a different kind of AI

2of2card.bsky.social•4 days ago

If the AI is wrong then we should establish mathematical truth by legislative fiat.