Tow Center for Digital Journalism study found that "AI" chatbots provided incorrect answers to more than 60 percent of queries, with Musk's Grok 3 responding to 94 percent of the queries incorrectly. "Premium chatbots provided more confidently incorrect answers than their free counterparts." - ThreadSky

karlbode.com • 1 day ago

Tow Center for Digital Journalism study found that "AI" chatbots provided incorrect answers to more than 60 percent of queries, with Musk's Grok 3 responding to 94 percent of the queries incorrectly.

"Premium chatbots provided more confidently incorrect answers than their free counterparts."

Comments

arvanitopoula.bsky.social•1 day ago

Ah, yes, the confidently wrong answer guy. Just what we all asked for.

karlbode.com•1 day ago

definitely need more of that

karlbode.com•1 day ago

This comes on the heels of a different BBC study that found that "AI" assistants struggled to provide accurate news synopses with any consistency.

-51% had clear errors

-19% introduced "factually inaccurate "statements, numbers and dates"

-13% altered or just completely made up quotes

himself132.bsky.social•15 hours ago

If we wanted made-up quotes, we'd just ask Boris Johnson.

karlbode.com•1 day ago

which is all, you know, not great given the already pronounced war on journalism and informed consensus being jointly waged by authoritarians and consolidated corporate power

karlbode.com•1 day ago

When affluent owners of consolidated corporate media introduce AI they're not doing it to improve product quality, make employees' work easier, or out of a respect for readership.

They're doing it to save money, cut corners, attack labor, and (as with the LA Times) entrench ownership bias:

karlbode.com•1 day ago

It should be noted the Tow study wasn't asking these "AI" chatbots to do a particularly difficult thing! They were simply asking the bots to accurately cite headlines, author names, and publication dates for the information they were regurgitating!

karlbode.com•1 day ago

the BBC study showed "AI" can't craft accurate news synopses reliably. The Tow study shows "AI" can't accurately provide citations reliably.

Yet OpenAI and others want you to believe this technology is just a few breaths and another million away from sentience.

aruss-updates.bsky.social•1 day ago

Oh, it’ll be sentient. But dumb AF.

malaforethought.bsky.social•1 day ago

Everday is unprecedented, therefore there is nothing to base their predictive text on.

shriketron.bsky.social•1 day ago

Meanwhile, the article then specifically graded response on:
"(1) the correct article, (2) the correct publisher, and (3) the correct URL".

1,3 is important; 2 is redundant but still got graded down.

So realistically, the only colors worth alarm is Red and down because of biased grading.

jazzlet.bsky.social•1 day ago

I strongly disagree that the correct publisher is redundant. All news sources are biased, if you don't know who published the article you don't know in what way that news source is biased.

ineptitudinous.bsky.social•1 day ago

williampietri.sfba.social.ap.brid.gy•1 day ago

@karlbode.com This is another study I'd add to yours. It's a challenging attack for LLM advocates to fight. https://www.newsguardrealitycheck.com/p/a-well-funded-moscow-based-global

mimimedusa.bsky.social•1 day ago

"vegetative electron microscopy"

mejomelissa.bsky.social•1 day ago

📌

mjaum.bsky.social•1 day ago

I am starting to *really* wonder about the people who claim they save hours every day using genAI. Is it because they are utterly shit at their jobs?

karlbode.com•1 day ago

it does raise some questions, given that once people use these systems for subjects they're really well informed in, the kinks always start to show rather quickly

davidpaintstuff.bsky.social•1 day ago

This is the fundamental issue with LLM.

They can't be trusted, and so you end up spending more time checking what they've done than it would have taken to just do it yourself.

It's like hiring an idiot and having to constantly check all their work.

maniagnosis.bsky.social•1 day ago

📌

planesailinggames.com•1 day ago

Sounds like the left hand doesn’t know what the right hand is doing, with the BBC planning to… provide AI synopses? https://bsky.app/profile/planesailinggames.com/post/3ljrvty52jk2y

mattjord.bsky.social•1 day ago

But what if we shovel even more capital into AI and say Rumpelstilkstin three times? That'll fix it and relieve us from the burden of learning and knowing things, right?

karlbode.com•1 day ago

if anybody asks questions just fire them

what could go wrong

fjerome.bsky.social•1 day ago

What else can you do, look down?

nordlund.me•1 day ago

I think this is the main problem with AI. They create an internal fuzzy representation of what’s read. Which isn’t all too dissimilar from us. Our brains also don’t have perfect, photographic recollection. However, AI is used as if it is a traditional computer.

nordlund.me•1 day ago

So the problem is at least as much about the user perception of AI, as the AI itself.

If AI isn’t relied upon for factual recollection and instead for creative output or output that will be vetted… I think then we’re looking at better use cases. For example, coding can be helped. Or brainstorming.

wiggiescotia.bsky.social•1 day ago

Giving bad info when the entire product is info is the exploding rocket of “we have to put this product out there that doesn’t work to justify everything even though it doesn’t remotely work”.

welshborat.bsky.social•1 day ago

Groundbreaking and bbc in the same sentence?
Nah.

antartica81.bsky.social•1 day ago

I fully concur, based on when I've used it.

grindal.bsky.social•1 day ago

This is a great reason to stick with simpler local AI. The tech minded should check out https://Ollama.com or Apollo AI. Still, AI is a loquacious lier that can’t be trusted.

carolramos.bsky.social•1 day ago

📌

rooker.bsky.social•1 day ago

"Though it cannot hope to be useful or informative on all matters, where it is inaccurate it is at least definitively inaccurate" -Douglas Adams

mahri.mochi.ws•1 day ago

📌

prettyqueerthing.bsky.social•1 day ago

lol of course the one Elon touched is by far the worst lol

Because he’s an idiot.

shriketron.bsky.social•1 day ago

This is nice! So it's saying Perplexity Pro is one of the most accurate vendors around and worth investment; as long as use is still vetted / copy-edited by expert human.

Doesn't point out which model used as Perplexity offers at least 6 different models; though a few are mentioned.

Useful!

shriketron.bsky.social•1 day ago

In roundabout, I asked Perplexity's GPT-4o, Claude 3.7, Grok 3 to critique the CJR article's methodology.

Both highlighted substantive flaws in the study's design.

Grok 3's response was the worse one, as reviewed, suggesting it failed to comprehend the article's content and research approach.

bgbell.bsky.social•1 day ago

So “Premium” white guy chat bots are full of shit too, go figure

jknorrland.bsky.social•1 day ago

📌

jonreilly1313.bsky.social•1 day ago

Junk in. Junk out.

juliannenomad.bsky.social•1 day ago

I've tipped from neo-luddite to luddite at this point.

lurks-no-more.bsky.social•1 day ago

This is yet another reason nobody should use the LLM "AIs" for anything.

institute11.bsky.social•1 day ago

They are also bad at citing academic sources going by the work of too many of my students. The easiest way to determine that they had an AI assist is to check out their bibliography and in-text citations.

longislandtonv.bsky.social•1 day ago

I literally just bored the ears off of my 11th grader this morning about all of this. Perfect timing!

bjkeefe.bsky.social•1 day ago

Loved this the best: "Premium chatbots provided more confidently incorrect answers than their free counterparts."

karlbode.com•1 day ago

matches the error prone bravado of their creators

desarrayed.bsky.social•1 day ago

https://fortune.com/2025/03/10/elon-musk-doge-automating-government-tasks-ai-chatbot-slash-federal-workforce/

this explains much

caseylangwith.bsky.social•1 day ago

The Time COO comment at the end seems unnecessarily dismissive to put the onus on the consumer. If 400M people are using these chatbots, they are not all going to be sophisticated.

I was surprised at how many people I know who are not in tech who use ChatGPT for therapy