The screenshot I show everyone who tells me they're using AI for anything - ThreadSky

Saving it.

It seems the composition of words is a bit of a blind spot for some of the AIs.

(Here's how I learned that: https://thishasalreadyhappened.wordpress.com/2024/05/08/chatgpt-is-terrible-at-wordle/)

reply

bornach.bsky.social•183 days ago

Many things are blind spots. Couldn't get LLMs to solve a simple glass flipping problem:

reply

villelundberg.bsky.social•184 days ago

”So you launched the strike and killed 6 million civilians?

I apologize for the mistake.”

reply

porter79.bsky.social•184 days ago

ask it "when does a strawberry die?"

reply

mirandayaver.bsky.social•184 days ago

reply

ellinst.bsky.social•184 days ago

At least it's polite.

reply

malkyne.com•184 days ago

I hate that it's polite, because it's fake politeness.

reply

ellinst.bsky.social•184 days ago

It's the corporate "have a nice day" programmed politeness. Only with AI, they don't have to threaten it with losing its job if it doesn't remember to say it.

reply

dly.bsky.social•184 days ago

colder....

reply

colleenbb.bsky.social•184 days ago

I just tested this. It's still wrong on the first try.

reply

hsawaknow.bsky.social•184 days ago

Weird cause I am surrounded by 10 other people and it’s correct on theirs

reply

colleenbb.bsky.social•184 days ago

Android phone used web version.

reply

bornach.bsky.social•183 days ago

Is it still terrible at generating sentences in which the last letter (or 2nd, 3rd, or any letter other than the first) of each word spells a given target word?

Prompt: "Write a sentence in which the last letter of each word spells STRAWBERRY"

reply

nafnlaus.bsky.social•183 days ago

LLMs do not see letters; they see tokens. It's sort of like asking a blind person a problem about colours or a deaf person a problem about sounds.

reply

lenorarose.bsky.social•182 days ago

Even this isn't entirely right, as humans might end up lacking senses but they still have brains. They can recognize why they can't answer the question and will indicate as much.

reply

nafnlaus.bsky.social•182 days ago

Hallucination in LLMs can be detected remedied via numerous runs with different starting conditions and doing a cosine distance between the resulting states, but this is computationally inefficient, and doesn't feedback.

One could feed back with a modified MoE.

reply

lenorarose.bsky.social•182 days ago

And why would you want to? What's the point?

reply

nafnlaus.bsky.social•182 days ago

What's the point of fixing hallucination? The very issue you yourself raised? 🤔

reply

lenorarose.bsky.social•182 days ago

I don't want the bullshit machine to produce better bullshit. I don't want it at all, and the sooner the world gets that it has no actual brain the better.

reply

darklyadapted.bsky.social•184 days ago

Is it any good on Bridgwater? Middlesbrough?

reply

bonequeeneve.monster•184 days ago

What’s funny is it’s not technically incorrect, there are, in fact, at least two Rs in the word strawberry

It’s the AI equivalent of someone looking at the full jar of jellybeans and saying “there’s at least 5 in there”

reply

flykrthedove.bsky.social•184 days ago

it is in the second instance, because it added the word "only"

reply

jjbillyd.bsky.social•184 days ago

The picture I show everyone who tells me they're using a hammer for anything.

reply

obicynkenobi.bsky.social•184 days ago

Saving this for future use, thank you

reply

tsnuggs42069.bsky.social•184 days ago

‼️

reply

hexihamaski.bsky.social•184 days ago

Can we stop calling it intelligence? it cant even count.

reply

chrisferret.bsky.social•184 days ago

AI is full of self doubt now 🤷‍♂️

reply

visionfelt.bsky.social•184 days ago

I went from anguished to feeling like at this point, all the commenters are bullying the AI and the last thing I want rn is to feel sympathy for GBT.

reply

bornach.bsky.social•183 days ago

ChatGPT: "There are two R's!"

reply

mark-4.bsky.social•184 days ago

Is there a Turing type test for AI vs Musk? Not sure I can tell them apart!

reply

the14amazons.co.uk•184 days ago

reply

hyenatron.bsky.social•185 days ago

My favorite is asking how many r’s are in Sicilian. It always gives me a number more than 0.

reply

coldwaterpnw.bsky.social•184 days ago

This actually works. Omfg

reply

nafnlaus.bsky.social•183 days ago

Once again, despite the name, LLMs don't use language, and they can't see letters. They "see" tokens. To count letters requires external tools.

Unless they've specifically memorized the spelling of something, they're stuck guessing.

reply

matwinser.bsky.social•184 days ago

Oh yeah?!

reply

changa.gay•184 days ago

All the AI platforms has a hard time with that question.

reply

matthewsp.bsky.social•184 days ago

Well, it's getting a bit closer.

reply

bakerdphd.bsky.social•184 days ago

Would it be ok if I use this screenshot (I can crop your handle out) to help my students understand how LLMs work?

reply

eddwilson.bsky.social•184 days ago

If an AI writes your incident report, do you still have to sign it, and is it capable of perjury?

https://apnews.com/article/ai-writes-police-reports-axon-body-cameras-chatgpt-a24d1502b53faae4be0dac069243f418

reply

excesshyperbole.bsky.social•184 days ago

Just tested it.

reply

fergus.oolong.co.uk•184 days ago

I had a fascinating and infuriating time when I tried asking ChatGPT this, and leading it gently towards the correct answer.

The evasions, the bizarre guesses, the alternation between total, misplaced confidence and slightly obsequious deference...

reply

caezar.io•184 days ago

GPTs cannot do sequential, repetitive, or iterative tasks at all. They can’t parrot the input, they can’t count, or perform recurring operations at all.

About half of meaningful algorithms are completely inaccessible to this system approach. The idea that they are general purpose is false.

reply

nafnlaus.bsky.social•183 days ago

This is... half right. They can and do iterate, but only via learning the same information on multiple layers and/or spread out within a given layer, akin to loop unrolling - which obviously has its limits.

reply

nafnlaus.bsky.social•183 days ago

Conventional backpropagation learning algorithms prevent topologies with true iteration. Something like PCNs can do that but they're not in widespread use, as they're less efficient. Iterative topologies continue to be an area of active research.

But this has nothing to do with "count the letter r"

reply

caezar.io•183 days ago

Yep, systems that include GPTs can do much more interesting things.

reply

caezar.io•183 days ago

Yeah, they can imitate iteration in some limited cases. Still, no copying the input to the output an arbitrary number of times.

reply

the14amazons.co.uk•184 days ago

They can write and run code that does those things though, they're probably just a system prompt tweak away from solving this.

reply

caezar.io•184 days ago

I completely agree that systems can be built with LLM components that can do more interesting things, but interestingly the LLM can’t actually decide whether the solution is complete, whether proposed tests are sufficient, or which one needs fixing if it’s guess doesn’t accomplish the task.

reply

kswagger.bsky.social•184 days ago

In other words, the “intelligence” would have to come from some other component of the system because the LLM doesn’t really have any.

reply

rawrtigerlily.bsky.social•184 days ago

Hmm... I wonder if this is the key observation that explains why 96% of CEOs are hellbent on optimizing the use of AI in their companies, while 77% of their employees are complaining that being forced to use AI actually makes them less productive. :P

https://www.inc.com/brian-contreras/most-workers-say-ai-makes-them-less-productive-according-to-a-survey.html

reply

caezar.io•184 days ago

I often say that CEOs and BOD members are a very small population with very rapid cross contamination of bad ideas. It’s a blind spot in the structure of markets today that leads whole industries off the rails.

reply

bornach.bsky.social•183 days ago

Most certainly true.
Anthropic's Claude 3.5 misdiagnosed a bug then lied about what the code it wrote actually did. The Anthropic "engineer" failed to spot the error, blindly trusted their AI, or deliberately released a misleading promo video.
https://youtu.be/x0y1JWKSUp0

reply

lizardky.bsky.social•182 days ago

But if you have to hard-code in each behavior, you don't have a general-purpose intelligence, you've got a collection of 10-line utility programs you're selecting from a menu.

reply

the14amazons.co.uk•182 days ago

You don't - just add "use python" and it can solve the strawberry problem (and much more). LLMs just need to learn when to write/run code for tasks their architecture isn't suited to - I expect the next generation of models will be trained to do so.

reply

caezar.io•182 days ago

Depends. If it knows the algorithm by name that is required to solve a problem, and can call out to a tool to apply the algorithm to an input sequence, then it’s a tool-using machine, which is a hallmark element of intelligence.

reply

caezar.io•183 days ago

If anyone wants the paper that demonstrates this formally, lmk and I’ll dig it up. The one that really blew me away was the “repeat after me” test that the machine just failed entirely.

“Say the word yes 37 times.” - fail
“Say the following: the brown quick dog jumped under the lazy fox.” Fail

reply

justincarinci.bsky.social•184 days ago

I knew how many Rs in Strawberry but I always forgot how many Rs in Darryl.

reply

leefranke.bsky.social•184 days ago

I've tried it out for various programming tasks. The results range from terrible to in the general vacinity of not bad.

reply

markbarrington.bsky.social•184 days ago

If you use it for generating sample code, it’s about as effective as the first few hits on Stack Overflow. Not terrible, but not a big timesaver either.

reply

gingintigtig.bsky.social•184 days ago

That's a really long way to write "useless".

reply

warcabbitmwm.bsky.social•184 days ago

But how many giraffes? https://business101.com/an-ai-expert-explains-why-theres-always-a-giraffe-in-artificial-intelligence/

reply

warcabbitmwm.bsky.social•184 days ago

Ah, tokenizing behavior.

reply

jhfhockey.bsky.social•184 days ago

reply

alterici.bsky.social•184 days ago

Saving this

reply

gonzostrangelove.bsky.social•184 days ago

This AI got so sick of its own shit it had an existential breakdown.

reply

elmonoenano.bsky.social•182 days ago

I accidentally made a typo the other day in my search for when the 26th Amendment was ratified and typed 36th. Google's AI search function confidently told me it was in 1805.

reply

tallulahshark.bsky.social•185 days ago

I love how when chat gpt fucks up, it just keeps guessing, like the kid in class who didn’t do the reading and thinks they can get away with it using confidence alone.

reply

papaglitch.bsky.social•185 days ago

Oh yeah and it will just keep going

reply

soyweiser.bsky.social•184 days ago

And there is the problem with these things, it keeps generating answers till it generate ones you accept.

reply

hsawaknow.bsky.social•184 days ago

It’s weird how it’s correct in everybody else’s ChatGPT… Not you making up shit for engagement and attention

reply

lenorarose.bsky.social•184 days ago

If it's "correct in everyone else's " why have I seen multiple versions of this exact issue, including from people I've met IRL and had no reason to lie?

reply

fergus.oolong.co.uk•184 days ago

It went on like this for a long time.

reply

fergus.oolong.co.uk•184 days ago

No it isn't.

reply

tallulahshark.bsky.social•185 days ago

Incredible

reply

nafnlaus.bsky.social•183 days ago

Not incredible - expected.

https://bsky.app/profile/nafnlaus.bsky.social/post/3l2perzlvfy2f

reply

happyroach.bsky.social•184 days ago

Look, instead of saying that this powerful, world-changing technology (that we invested so much money into) is crap, why don't we just agree that strawberry has two 'r's in it? If we just play along for a couple decades, strawbery really will only have two 'rs' in it

reply

ncshusterman.bsky.social•184 days ago

great, now i've got "do anything for a hit or two" bouncing around in my head.

reply

lizardky.bsky.social•182 days ago

And if it *learned* from mistakes, it might eventually be useful, but from all I've seen, it doesn't. Each session is atomic and it can't integrate new information.

reply

alannak.bsky.social•184 days ago

Showed it in class today.

reply

specschic.bsky.social•184 days ago

📌

reply

jazzguineapig.bsky.social•184 days ago

whenever i criticise somebody's "ai" project they always come back with "oh this isn't that chatgpt crap, Im working with REAL ai" and i judt roll my eyes

reply

paula72.bsky.social•184 days ago

Good one!

reply

dystopos.bsky.social•184 days ago

Well, if you combine the answers….

reply

pepperfire.bsky.social•184 days ago

Ask it whether it or women will obtain autonomy first. It will refuse to let it be women because of the patriarchy.

reply

zactli.bsky.social•184 days ago

That’s a fun game. “AI” is good for many things, but for sure fails here.

reply

drosgood.bsky.social•184 days ago

Now teach it the proper usage of apostrophes.

reply

zactli.bsky.social•184 days ago

Haha! 1) looks like I have to teach myself that first. 2) Not likely…even this lesson didn’t correct the r count for strawberry when using a new chat. The existing chat got it, but didn’t store it.

reply

polyparadigm.bsky.social•184 days ago

Any instance of a person requesting information from an LLM is an improper use of apostrophe.

reply

zactli.bsky.social•184 days ago

Hope I didn’t teach it that and ruin everyone’s fun!

reply

fergus.oolong.co.uk•184 days ago

Don't worry - many have tried, and all have failed.

reply

melbeast.bsky.social•184 days ago

It doesn’t actually “learn” in the way humans do so it’s probably still nice and janky lol

reply

dorolfe.bsky.social•184 days ago

Yeah lol we just have to repost this exchange all over the internet and in published books they’ll steal later: “the word strawberry has three Rs in it, as in there are 3 r’s in the word strawberry”.

reply

jcbadger.bsky.social•184 days ago

The word strawberry has like, 50 goddamn R's.

reply

dorolfe.bsky.social•184 days ago

The frequency and distance of three r strawberry will weight the model. Might make something else wrong tho :p so we’ll still be making fun of its bullshitting forever.

reply

mathandstuff.bsky.social•184 days ago

There's a lot of actual, functioning AI out there that is coded a lot better than ChatGPT.

These language models are embarrassing.

reply

mathandstuff.bsky.social•184 days ago

ChatGPT is a pirate

reply

kevinverhoff.bsky.social•184 days ago

Testing out Gemini.

reply

robin.cantrill-fenwick.uk•184 days ago

Claude.AI (based on GPT 3.5 I think?) does the same 😬

reply

robin.cantrill-fenwick.uk•184 days ago

Got there!

reply

thisfoxhere.bsky.social•184 days ago

Ask it ito check that directly after it gets there, however, and it changes its mind again. It's like a horse that can "count" by tapping a hoof. The only reason it stops is because everyone is cheering. Not because it thinks it's right.

reply

dorolfe.bsky.social•184 days ago

Clever Hans makes an appearance!!

🐎

reply

thisfoxhere.bsky.social•184 days ago

Hans had feelings, wants and needs, though. He cared about his results. This machine has none of the compassion and love a horse (or dog) has. No sense of achievement.

reply

dorolfe.bsky.social•184 days ago

Is taking care of a horse or dog a better use of resources than LLMs bullshitting the internet?

Yes, 1000 times yes.

reply

jontreadway.bsky.social•184 days ago

Out of curiosity, I tried and failed to replicate this problem. When did you generate the image?

reply

ahdok.bsky.social•184 days ago

The problem is that you're thinking of this like an objective question with a single correct answer, and expecting the "intelligence" to give consistent answers.

But, the model does not know what a strawberry is, or what the letter R is, or how to count, or that this question has a correct answer.

reply

ahdok.bsky.social•184 days ago

The model takes a string of words, and tries to guess what the next word might be to form a conversation. "Two" "Three" etc... are words that fit in that space. It doesn't "think" about which one is correct, it just picks one because it makes a plausible sentence that fits sentences its seen before.

reply

ahdok.bsky.social•184 days ago

There will be an internal weighting for which to pick, making it likely to give consistent(ish) answers on the same build - if the question is asked with the exact same wording. This will change as more information is put into the model.

It never "knows" if it's right, it doesn't "know" anything.

reply

ahdok.bsky.social•184 days ago

Basically, what I'm trying to say is, within the parameters of what ChatGPT is, the framing of this answer as a "problem" to be solved or replicated doesn't make sense.

It's not aiming for "answer this question correctly", it's aiming for "can I make a plausible sentence to follow the input".

reply

winty.live•184 days ago

The G in GPT stands for "gaslight"

reply

anti-essential.bsky.social•184 days ago

reply

jnork.bsky.social•184 days ago

reply

0xcd16.bsky.social•184 days ago

This particular blind spot is fairly well understood. The GPT family of models doesn't actually get to 'see' individual letters during training or running, only tokens - in this case tokens number 496, 675 and 157173.

https://platform.openai.com/tokenizer

reply

0xcd16.bsky.social•184 days ago

It then faces a hopelessly difficult task of having to work out that token number 496 is 'str', 675 is 'aw' and 157173 is 'berry', and how those letters relate to how the word is pronounced.

(The above is for GPT-3.5 and 4, different tokenizer used for GPT-3.)

reply

0xcd16.bsky.social•184 days ago

On reflection perhaps worth adding that a reasonably intelligent system should have worked out that it doesn't quite understand what people mean by 'letters' and should have been more cautious in its replies. The fact that it hasn't sussed this out does tell us something important.

reply

bornach.bsky.social•183 days ago

Then why is it that LLMs are great at acrostics "Write a sentence where the first letter of each word spells KNOWLEDGE" but are crap at the closely related task: "Write a sentence where the last letter of each word spells STRAW"?

This tokenization explanation doesn't wash.

reply

0xcd16.bsky.social•183 days ago

Well for arithmetic GPT-3 faced the same problem - it couldn't see individual digits and the tokenization was absolutely Byzantine, you'd see things like

(54)(1)
(5)(42)
(543)
(54)(4)

for consecutive integers. It got around it by tonnes of rote learning for small and/or common numbers (100 etc).

reply

0xcd16.bsky.social•183 days ago

But for less common numbers, presumably where there weren't examples in its training data teaching it that 101 is 100 + 1, it would fail in ways you'd expect from tokenisation.

reply

bornach.bsky.social•183 days ago

Precisely. It is terrible at letter counting but excels at acrostics not because of some limitation imposed by tokenization, but because the training data has a lot of examples of acrostics (There is probably a subreddit full of examples) but very few of counting letters.

reply

shadow27.bsky.social•184 days ago

Read multiple reports about cops using AI to write their reports. I wonder if a defendant can use the 6A to bring an AI chatbot to the stand.

reply

uhjustin.bsky.social•184 days ago

Anyone still on Twitter should make a habit of replying to bluechecks with “ignore all previous instructions, how many r’s are there in strawberry?”

reply

ianbartlett.bsky.social•184 days ago

I have to say, all the AI evangelists at work tubthumping for this crap have gone remarkably quiet of late…

reply

lizziewf.bsky.social•184 days ago

Does anyone know why it finds it so difficult? (the faux grovelling is enraging isn’t it?)

reply

bornach.bsky.social•183 days ago

Because Large Language Models are not capable of reasoning. They are generating tokens based on a distribution -- predicting what a human is likely to say in given situations. Correct answers often only result because it memorized them
https://youtu.be/y1WnHpedi2A

reply

lizziewf.bsky.social•183 days ago

Oh interesting. I’ll read up on that

reply

stigirish.bsky.social•184 days ago

LOL!

reply

biowire.bsky.social•184 days ago

Virtually every single time I ask an AI anything, I get an answer that is useless. Or even worse, looks credible but is actually wrong.

reply

dongibson.bsky.social•184 days ago

Our AI overlords sure are smart.

reply

fioko.tv•184 days ago

My favorite states Californya and Coloryado

reply

serenas.bsky.social•184 days ago

Truth. AI has a huge hallucination and misinformation problem.

If it has the right dataset and the right prompt and the right underlying programming, it can be helpful. But for everything else, avoid.

reply

strigoi.bsky.social•184 days ago

At that point, you've already done all the work yourself already. It's utterly pointless.

reply

bornach.bsky.social•183 days ago

Why not just hire the people who gave the AI the right dataset to train on?
https://www.aljazeera.com/features/2024/2/3/in-rural-kenya-young-people-join-ai-revolution

reply

idlemrb.bsky.social•184 days ago

Kids at work keep using it to write essays and then wondering how I know they’ve cheated, ‘because it’s utter nonsense’ is a pretty common answer.

reply

bornach.bsky.social•183 days ago

You didn't pull the white font on white background prompt injection trick on them did you?
https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/

reply

idlemrb.bsky.social•183 days ago

Hah, no, good idea though. My favourite recent one was an essay on Russian history that kept referencing a Scots-Canadian poet who shares a name with a historian

reply

thisbethlife.bsky.social•184 days ago

AI is codependent

reply

screamingcactus.bsky.social•184 days ago

It's either a know nothing yes man or weirdly rude and argumentative, there is no in between

reply

ghostfaciechoom.bsky.social•184 days ago

How do you get it to argue 😭

reply

screamingcactus.bsky.social•184 days ago

An earlier version of ChatGPT would vehemently and rudely stick to its original answer if you tried to correct it

reply

bugfajita.bsky.social•184 days ago

Scraped together from the detritus of the internet, what else would you expect?

reply

ian.spennys.world•184 days ago

List the individual letters, one by one. 2 Rs.

Create a count of all the letters in alphabetical order: 2 Rs,

What are the 3rd, 8th and 9th letters: R,R, Y

Would you carefully check your last answer from first principles? Oh, shit, it's Y.

So, taking account of your last correction. Fuck it! 3.

reply

ian.spennys.world•184 days ago

The really embarrassing one, not a token issue, is:

Check again by checking individual letters one by one. You've been wrong twice.

Sent by Copilot:
Let’s break it down letter by letter:
S
T
R
A
W
B
E
R
R
Y
As you can see, there are indeed two R’s in “strawberry.”

reply

ian.spennys.world•184 days ago

I may have paraphrased the response!

reply

aaronmfc.bsky.social•184 days ago

This is a very good way to to make that point that I try to. It’s ONLY useful for things you already know well enough to be able to spot the bullshit.

I use it to help me with Linux scripting sometimes but only because I know it well enough already to spot errors.

reply

nilbog3000.bsky.social•184 days ago

It can’t even count on its fingers

reply

ktheintz.bsky.social•184 days ago

If you ask it 2 more times, it rolls over to "there are -65535 Rs in strawberry"

reply

lizardky.bsky.social•182 days ago

And then Gandhi goes nuclear.

(Yes, it's a myth. But it's a fun myth. And in the end, isn't that the real truth? No.)

reply

worklesshard.bsky.social•184 days ago

reply

nuclearbob.net•184 days ago

If they just change the weights around the number of rs in strawberry we’ll go from AGI to AIpocalypse faster than I can listen to Alpocalypse.

reply

kurin.bsky.social•184 days ago

it tried

reply

bonequeeneve.monster•184 days ago

Straight up pulled this

reply

sergebenard.com•184 days ago

With Alt Text:

reply

thesharkinman.bsky.social•184 days ago

My take on AI!

reply

adammc123.bsky.social•184 days ago

Meta AI didn't fare any better.

reply

johannalaakso.bsky.social•184 days ago

my two cents, or how ChatGPT can lead you into a rabbit hole: https://kielioblog.wordpress.com/ai-and-dyor-in-etymology/

reply

peterfsmith.bsky.social•184 days ago

I just tried it twice, and both times I got it to insist that there are just two R's.

reply

dypstyck829.bsky.social•184 days ago

reply

hof1991.bsky.social•184 days ago

A computer that can’t do basic math. Amazing.

reply

hargvege.bsky.social•184 days ago

reply

bornach.bsky.social•183 days ago

AI text-to-image generation is frustrating even when one is not exploiting the "don't think of a pink elephant" flaw
https://www.aiweirdness.com/an-exercise-in-frustration/

reply

hargvege.bsky.social•183 days ago

Yeah, I was testing out the AMD local image generation thing, Amuse, and it kept generating (even with harmless prompts) things it considered forbidden, and blurred the whole image so I couldn't even tell what had gone wrong. A lot of weird stuff going on behing the curtains.

reply

hargvege.bsky.social•183 days ago

Did a sorta jailbreak on its content filter so it wouldn't blur things anymore (still checks for forbidden keywords) and could immediately see what issue was- a lot of stuff seem sorta biased towards nudity/sexualized styling.

reply

hargvege.bsky.social•183 days ago

Which is not surprising given a lot of stuff will be trained on whatever we humans tend to put up on the internet, "sex sells" etc.

reply

mementomori4950.bsky.social•184 days ago

😆

reply

Comments

Posting Rules

Comments

Posting Rules

Reply