Welp. 2/2 - ThreadSky | a Reddit-style client for Bluesky

I’m always entertained by the flow of “oh, my bad! I was so wrong, and I see that now. How silly of me! Anyway here’s another blatantly wrong answer I’m going to deliver just as confidently”

reply

tehzachatak.bsky.social•47 days ago

I pushed it to keep going further and it is having a complete meltdown

reply

peter-butler.bsky.social•47 days ago

I'm surprised it doesn't have the answer just straight in its model (like 2+2)

It's a fairly common clue (psst, it's DEER)

reply

tawni.bsky.social•47 days ago

It couldn't even spell a 4 letter word backwards properly. 🙄
But AI though! 🤩
🙄🙄🙄

reply

nafnlaus.bsky.social•47 days ago

You should be using o1 for these sorts of tasks.

I mean, spelling problems in general for llms are kind of like asking a blind person about colours (they don't "see" letters), but at least give it a fair shot.

Also, what is the answer to this riddle?

reply

jowilliams.bsky.social•46 days ago

It's more than that there a still people discovering that just because LLMs are trained in producing self confident sounding text doesn't mean they actually do reasoning.

reply

jowilliams.bsky.social•46 days ago

It uncovers our own biases too, like assuming that someone who went to a posh school and speaks "well" is also able to solve problems. There's a LOT of that in Britain.

reply

jowilliams.bsky.social•46 days ago

I wish these tools were being promoted alongside more training on what they are capable of. People are so easily fooled by the language fluency.

reply

jowilliams.bsky.social•46 days ago

That applies to either of the above. 🙃
Oh, and it's reed, which is a type of grass.

reply

nafnlaus.bsky.social•46 days ago

I indeed think it would be good for people to learn more about what they can and can't do. Which is what I'm trying to do in this thread, by pointing out that giving a LLM a linguistics problem is like giving a blind person a visual problem.🙂 Despite the name, they don't see text. They "see" tokens.

reply

drgdave.bsky.social•47 days ago

Apparently it is *NOT* the case that AI is 100% garbage --- but I don't think I've seen any examples of the non-garbage kind.

reply

parsingphase.dev•47 days ago

The non-garbage kind lives in Canada, you won't have met it.

reply

tjradcliffe.bsky.social•47 days ago

I live in Canada, and have not met it. Anyway, the correct answer to this puzzle is "duck", which when reversed becomes "kcud", which is regurgitated plant matter chewed by cows, which when reversed becomes swoc, and when reversed again becomes kernudflap.

reply

ingridmfh.bsky.social•46 days ago

Please just stop, guys, you're wasting water, energy and emissions on this. We're in year 1 after 1.5 Celsius and this is pointless. Of course it's not smart, it's a statistics-driven ventriloquism trick. Stop burning planetary resources on this.

reply

jbckwthjr.bsky.social•47 days ago

We're so cooked

reply

tickleslarue.bsky.social•47 days ago

And here I thought we were going to end up with "The Matrix" but really we're going to get just really bad Abbot and Costello.

reply

hardbyte.bsky.social•47 days ago

Does pretty well if you use the current model

reply

sandopan.bsky.social•47 days ago

LLama just dies.

https://i.imgur.com/utOki8G.png

reply

carrowcanary.bsky.social•47 days ago

The year is 3264.

All is quiet, for humanity abandoned Earth to travel the stars long ago. But somewhere, in a long-forgotten data centre, an AI is broadcasting. It repeats one line, over and over again;

"A correct answer is "Lion" is not correct, but "Lion" is close to the correct answer."

reply

wenz.bsky.social•47 days ago

Oh, my.

reply

drgeraint.bsky.social•47 days ago

Oh dear*

*deer

reply

unoclay.bsky.social•47 days ago

please stop using AI. its awful for the climate.

reply

toddymalone.bsky.social•47 days ago

ChatGPT also sucks at chess

reply

pbump.com•47 days ago

Or maybe you’re exceptionally good.

reply

pbump.com•47 days ago

Anyway the answer is “deer.”

reply

wheredafukarwi.bsky.social•47 days ago

Thank God humanity will be saved!

reply

johnwards.bsky.social•46 days ago

Was curious what another model might do with the question and tried Claude, decent answer, but now I'm worried someone hardcoded it...

reply

davidhouse.bsky.social•46 days ago

You had me wondering about possible hard-coding of results in Claude. With more testing, I thought this pair of answers was pretty interesting: "Think of pairs of words that are reversed to make other words. It's the location in a cupboard or cabinet where the dog trainers keep their Snausages."

reply

johnwards.bsky.social•46 days ago

I was confused at first as I thought was shelf, but I’m mildly dyslexic…

reply

davidhouse.bsky.social•46 days ago

Also, ChatGPT-4o got the correct answer right away:

reply

push2turn.bsky.social•47 days ago

That's incredible!

reply

joshdobbin.bsky.social•47 days ago

holy shit,I just tried this-- even when it gets one right gets it wrong on a fundamentally fucked up level.

reply

wheredafukarwi.bsky.social•47 days ago

We're fucked.

reply

motown.bsky.social•47 days ago

Fucking AI garbage. Nobody needs it or wants it.

reply

caezar.io•47 days ago

I’m sorry but “read” is not a plant, Philip.

reply

speterdavis.com•47 days ago

Thank you, I was about to turn into Godzilla

reply

lordgenome.bsky.social•47 days ago

The reverse of Godzilla is Alligator, which isn’t a plant.

reply

boudica24.bsky.social•47 days ago

It's a good thing Nvidia stock has little impact on the stock market and not at all up propping the entire market.

reply

misterfungi.bsky.social•47 days ago

which when reversed is of course “ered” or “e red” a red plant meaning russian spy

reply

unenthusiast.com•47 days ago

I like how you can see the machine go insane in real-time

reply

kccab.bsky.social•47 days ago

And this is the future of “customer service”….

reply

saveourfarms.bsky.social•47 days ago

So you missed your flight to Vancouver…

reply

silkscreenfiend.bsky.social•47 days ago

I'm sorry, I have it on good authority that "tab" is a loose synonym for "tabby plant" so I think I'm gonna side with the experts here.

reply

itisindeedme.bsky.social•47 days ago

👏ONLY👏IN👏CREATIVE👏CONTEXTS

reply

wadeblack.bsky.social•47 days ago

When I saw "tabby plant"

reply

marsrover.bsky.social•47 days ago

"good authority"

reply

burghpunk.bsky.social•47 days ago

This was driving me insane thank you

reply

erb2.bsky.social•47 days ago

hilarious!

reply

jaymarose.bsky.social•47 days ago

What’s a tabby plant?

reply

hamiltwan.bsky.social•47 days ago

Like a regular plant but a specific pattern of stripes. Often orange, but sometimes gray.

reply

jaymarose.bsky.social•47 days ago

Everyone thinks it’s indifferent to you, but cuddles when no one is looking?

reply

hamiltwan.bsky.social•47 days ago

This specimen is never indifferent:

reply

sjgenco.bsky.social•47 days ago

That is funny stuff! I tried it with Google Gemini (both 1.5 and 2.0 versions) and it went thru much of the same litany of errors. Never did get it. Finally I told it "deer" and it was very happy. My God, we are all doomed.

reply

evankirshenbaum.bsky.social•47 days ago

What model were you using? o1 took about 10 seconds to get the right answer.

reply

hughe.bsky.social•47 days ago

Claude got it! Honestly, I’m quite surprised. @anthropic.com

reply

hughe.bsky.social•47 days ago

Looks like @markperryau.bsky.social sniped me by 9 minutes. https://bsky.app/profile/markperryau.bsky.social/post/3lflez3pnoc2y

reply

johnwards.bsky.social•46 days ago

Oh and I've been sniped by hours...I should have scrolled further...

reply

hughe.bsky.social•47 days ago

This is interesting, I think. Claude used the same “reasoning”, but different words to come up with the same answer.
We know that LLM’s randomize the next word prediction, so I’m not surprised that the words were different. What does surprise me is that the method of finding the answer is the same.

reply

hughe.bsky.social•47 days ago

The simple answer is that this question was probably asked somewhere on the Internet and the LLM copied it, but who knows.

reply

mk.gg•46 days ago

Probably not. In that case it wouldn't bother with the reasoning part, it would just give the answer.

reply

diegodogdad.bsky.social•47 days ago

That makes sense.
What would explain that 4o still gets it wrong but o1 gets it right?
https://bsky.app/profile/diegodogdad.bsky.social/post/3lflgxhdo6k2x

reply

ricky.love•47 days ago

5 encouragements and it got it!

reply

ptelometry.bsky.social•46 days ago

Ume actually is a type of plant (Japanese apricot relative), but would be a pretty obscure reference

reply

ricky.love•47 days ago

thread link: https://chatgpt.com/share/6784637c-f344-8002-96d1-3134f444e9cc

reply

dmewes.com•47 days ago

Gemini 2.0 Advanced solves it. But it could just have been in the training set? I also tried asking it for a five letter word. The Rumel tree that it mentions doesn't seem to exist?

reply

bcnjake.bsky.social•47 days ago

This is art.

reply

snowman4.bsky.social•47 days ago

AGI my arse. Which is a kind of plant when read backwards, or something.

reply

debbyreynolds99.bsky.social•47 days ago

The correct answer, as you showed, requires only one short sentence. The AI responses to your question are much longer this year than last, while missing the mark entirely. Yikes, It's evolved to become more "Trumpian."

reply

thisiskatel.bsky.social•47 days ago

It's only as good as its data sources...

reply

thisiskatel.bsky.social•47 days ago

(By which I'm agreeing with you)

reply

heyalexei.bsky.social•47 days ago

Google Gemini is just as bad. I love how it calls the question “a classic word puzzle!” as if it’s got a simple answer ready before first failing to answer it correctly and then wrongly asserting that there actually is no answer.

reply

lukasneville.com•47 days ago

If you want to see some real galaxy brain AI thinking try to give it a tough wordle to solve

reply

jamgyal.bsky.social•47 days ago

Have you tried playing chess with it? Kept forgetting where the pieces were supposed to be.

reply

lukasneville.com•47 days ago

chess is a lot easier if you're not bound by trivialities like where the pieces are

reply

bluefairyva.bsky.social•45 days ago

…or how they are supposed to move…

reply

indymayne.bsky.social•47 days ago

reply

stutzbob.bsky.social•47 days ago

What I'm learning about chatbots (since I don't use them) is that they really seem to prioritize answering questions in any way possible, rather than finding correct answers. That would certainly seem to make it more toy than tool.

reply

catgirlhacks.com•46 days ago

the bot randomly saying "strawberry" reminded me of the scene from Iron Man 3 where JARVIS' speech system is damaged so he keeps saying the wrong word at the end of his sentences.
https://youtu.be/0qtLpQm0Qgk

reply

xor.blue•47 days ago

I have to tell you, as a person who enjoys *making* crosswords, this continues to be a relief

reply

jowilliams.bsky.social•46 days ago

Though I notice more of the prize cryptics are using twists like missing vowels in the grid. Eg the guardian Christmas 2024 one did this.

reply

obeymybrain.bsky.social•47 days ago

C'mon people, stop boiling the planet trying to get it to output deer::reed

reply

dragonnexus.bsky.social•47 days ago

....maybe stop training the thing?

reply

sharonhall.bsky.social•47 days ago

Did you try asking Claude?

reply

jrichelson.bsky.social•47 days ago

Claude is much better in my experience especially with coding.

reply

deuts.hamili.net•46 days ago

Next time I'll have a new #Python project I'll try Claude.

reply

amyhoy.bsky.social•47 days ago

but it’s absolutely lying to you how it arrived at that answer

reply

clevertrope.bsky.social•47 days ago

Bring us the AI that is trained on the deep embarrassment of previous failed AI

reply

drrosenpenis.bsky.social•47 days ago

Well that is "something"

reply

pdxgene.bsky.social•47 days ago

What I really need is a droid that understands the binary language of moisture vaporators.

reply

coleosssus.bsky.social•47 days ago

The obvious answer is Moose, which reverses to Elm Tree.

reply

zeldaqueen.bsky.social•47 days ago

You know, I'm starting to see why this thing is championed by idiots. The main difference between Trump's blathering and this is that the chatbot admits it was wrong before being wrong all over again.

reply

vecki.bsky.social•47 days ago

Gemini is just as bad with eel/lee and moth/hom (short for honeysuckle, it said ¯\_(ツ)_/¯)

reply

diegodogdad.bsky.social•47 days ago

o1 got it right when I tried

reply

ryanhide.bsky.social•47 days ago

reply

katelanddeck.bsky.social•47 days ago

I'm going to share this w my students if that is okay. This is hilarious and makes the point well.

reply

rtmiss.bsky.social•47 days ago

It's pretty much worse 😂🤣🤣

reply

dentonitis.bsky.social•47 days ago

"after careful thought" oh really?

reply

iamzoomy.bsky.social•47 days ago

This is awesome 😂

reply

hebrooks87.bsky.social•47 days ago

Plew?

reply

davidcrespo.bsky.social•47 days ago

mixed bag. o1-mini gets it. but at this point you can't be sure it wasn't in the training set

reply

davidcrespo.bsky.social•47 days ago

I thought flash with search would do better, but it gave me two really bad answers and then one crazy good one: emu -> ume (a kind of plum tree)

reply

drewblagrim.bsky.social•47 days ago

Isn't this something that a *Large Language Model* should theoretically be, ya know... good at? 🤔

reply

greenfret.bsky.social•47 days ago

No. LLMs are good at predicting what the most likely next word would be in a input. That's all.

reply

mrcheeze.github.io•46 days ago

Any task that involves knowing the *specific letters* in a word is especially challenging for them, because (as an efficiency shortcut), their input does not contain letters at all, but tokens.

reply

justinbuist.bsky.social•46 days ago

No. Not at all.

reply

srossmktg.com•47 days ago

seal ➡️ lees

Nice.

reply

nsarrazin.com•47 days ago

This felt like a good use case for reasoning models (better ability to detect its own mistakes especially around things like letter manipulations which are naturally challenging for LLMs) and indeed:

reply

penwrites.bsky.social•47 days ago

Gemini 2.0 Flash Experimental got caught up on homophones and suggested “Ewe” reversing to, well, “ewe.” Which sounds like “yew.” Then it said “mole” reverses to “Elom,” which kind of sounds like “elm.” Then it tried a couple nonsensical ones and gave up saying there is no answer.

reply

indifferentbliss.bsky.social•47 days ago

I have the subscription Chat GPT and use it for various things and usually what’s surprising is how strange the output is. My theory is the hallucinations may be the actual interesting and important feature

reply

katcc.bsky.social•47 days ago

Bizarre art - guessing “the war on Christmas”?

reply

mattlav83.bsky.social•47 days ago

Teach you to use ChatGPT. You should get yourself over to Gemini...

reply

walead.bsky.social•47 days ago

So I tried it in the Chatgpt pro mode and it worked...?

reply

josiekat.bsky.social•47 days ago

it worked because by now the model knows the right answer

reply

pbump.com•47 days ago

Not super great marketing that it only works if you give it money.

reply

jeremydstanley.com•47 days ago

wild that the paid thing takes 24 seconds to think when free search engines can just show you the answer as soon as you hit search

reply

walead.bsky.social•47 days ago

Yep. And I hate that almost all the search engines default now to an AI generated result at the top

reply

dfeldman.org•47 days ago

Running these advanced models takes two $100,000 GPUs. $20/month is a steal compared to the operating cost

reply

billchilds.bsky.social•47 days ago

(Whispering) what happens when they start actually charging what it costs

reply

dfeldman.org•47 days ago

Not my problem ;)

reply

tjradcliffe.bsky.social•47 days ago

I think "only works if you give it money" is precisely the point of the free/paid market model.

reply

walead.bsky.social•47 days ago

Didn't say it was great marketing :) it's actually rather frustrating.

reply

c2.lu•47 days ago

The version of Gemini accessible via my on-phone integration was able to get it. I tried a second time and it got the question wrong, then got it right with a reminder that both words shouldn't be plants 😅

Regardless, this is all just predicated on the idea that these systems are ..

reply

c2.lu•47 days ago

thinking machines, which they are not. They are very complex auto-complete with a bunch of pre/post processors and middleware to make them seem like they're thinking

reply

c2.lu•47 days ago

Caveat: despite my employer I am not an LLM expert, and am not speaking on behalf of my employer

reply

Comments

Posting Rules

Comments

Posting Rules

Reply