I’m always entertained by the flow of “oh, my bad! I was so wrong, and I see that now. How silly of me! Anyway here’s another blatantly wrong answer I’m going to deliver just as confidently”
I mean, spelling problems in general for llms are kind of like asking a blind person about colours (they don't "see" letters), but at least give it a fair shot.
It's more than that there a still people discovering that just because LLMs are trained in producing self confident sounding text doesn't mean they actually do reasoning.
It uncovers our own biases too, like assuming that someone who went to a posh school and speaks "well" is also able to solve problems. There's a LOT of that in Britain.
I indeed think it would be good for people to learn more about what they can and can't do. Which is what I'm trying to do in this thread, by pointing out that giving a LLM a linguistics problem is like giving a blind person a visual problem.🙂 Despite the name, they don't see text. They "see" tokens.
I live in Canada, and have not met it. Anyway, the correct answer to this puzzle is "duck", which when reversed becomes "kcud", which is regurgitated plant matter chewed by cows, which when reversed becomes swoc, and when reversed again becomes kernudflap.
Please just stop, guys, you're wasting water, energy and emissions on this. We're in year 1 after 1.5 Celsius and this is pointless. Of course it's not smart, it's a statistics-driven ventriloquism trick. Stop burning planetary resources on this.
All is quiet, for humanity abandoned Earth to travel the stars long ago. But somewhere, in a long-forgotten data centre, an AI is broadcasting. It repeats one line, over and over again;
"A correct answer is "Lion" is not correct, but "Lion" is close to the correct answer."
You had me wondering about possible hard-coding of results in Claude. With more testing, I thought this pair of answers was pretty interesting: "Think of pairs of words that are reversed to make other words. It's the location in a cupboard or cabinet where the dog trainers keep their Snausages."
That is funny stuff! I tried it with Google Gemini (both 1.5 and 2.0 versions) and it went thru much of the same litany of errors. Never did get it. Finally I told it "deer" and it was very happy. My God, we are all doomed.
This is interesting, I think. Claude used the same “reasoning”, but different words to come up with the same answer.
We know that LLM’s randomize the next word prediction, so I’m not surprised that the words were different. What does surprise me is that the method of finding the answer is the same.
Gemini 2.0 Advanced solves it. But it could just have been in the training set? I also tried asking it for a five letter word. The Rumel tree that it mentions doesn't seem to exist?
The correct answer, as you showed, requires only one short sentence. The AI responses to your question are much longer this year than last, while missing the mark entirely. Yikes, It's evolved to become more "Trumpian."
Google Gemini is just as bad. I love how it calls the question “a classic word puzzle!” as if it’s got a simple answer ready before first failing to answer it correctly and then wrongly asserting that there actually is no answer.
What I'm learning about chatbots (since I don't use them) is that they really seem to prioritize answering questions in any way possible, rather than finding correct answers. That would certainly seem to make it more toy than tool.
the bot randomly saying "strawberry" reminded me of the scene from Iron Man 3 where JARVIS' speech system is damaged so he keeps saying the wrong word at the end of his sentences. https://youtu.be/0qtLpQm0Qgk
You know, I'm starting to see why this thing is championed by idiots. The main difference between Trump's blathering and this is that the chatbot admits it was wrong before being wrong all over again.
Any task that involves knowing the *specific letters* in a word is especially challenging for them, because (as an efficiency shortcut), their input does not contain letters at all, but tokens.
This felt like a good use case for reasoning models (better ability to detect its own mistakes especially around things like letter manipulations which are naturally challenging for LLMs) and indeed:
Gemini 2.0 Flash Experimental got caught up on homophones and suggested “Ewe” reversing to, well, “ewe.” Which sounds like “yew.” Then it said “mole” reverses to “Elom,” which kind of sounds like “elm.” Then it tried a couple nonsensical ones and gave up saying there is no answer.
I have the subscription Chat GPT and use it for various things and usually what’s surprising is how strange the output is. My theory is the hallucinations may be the actual interesting and important feature
The version of Gemini accessible via my on-phone integration was able to get it. I tried a second time and it got the question wrong, then got it right with a reminder that both words shouldn't be plants 😅
Regardless, this is all just predicated on the idea that these systems are ..
thinking machines, which they are not. They are very complex auto-complete with a bunch of pre/post processors and middleware to make them seem like they're thinking
Comments
It's a fairly common clue (psst, it's DEER)
But AI though! 🤩
🙄🙄🙄
I mean, spelling problems in general for llms are kind of like asking a blind person about colours (they don't "see" letters), but at least give it a fair shot.
Also, what is the answer to this riddle?
Oh, and it's reed, which is a type of grass.
https://i.imgur.com/utOki8G.png
All is quiet, for humanity abandoned Earth to travel the stars long ago. But somewhere, in a long-forgotten data centre, an AI is broadcasting. It repeats one line, over and over again;
"A correct answer is "Lion" is not correct, but "Lion" is close to the correct answer."
*deer
We know that LLM’s randomize the next word prediction, so I’m not surprised that the words were different. What does surprise me is that the method of finding the answer is the same.
What would explain that 4o still gets it wrong but o1 gets it right?
https://bsky.app/profile/diegodogdad.bsky.social/post/3lflgxhdo6k2x
https://youtu.be/0qtLpQm0Qgk
Nice.
Regardless, this is all just predicated on the idea that these systems are ..