An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates. Experts say the problem is bigger than that - ThreadSky

newscientist.com • 11 days ago

An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates. Experts say the problem is bigger than that

Comments

casualwoke.bsky.social•11 days ago

Those hallucinations have lead to an increase in ‘pseudo-apologies’ from my recent interactions with Chat GPT as I highlight fundamental mistakes.
Maybe so much human feedback highlighting mistakes is too ‘emotionally painful’ for the AI algorithms to cope with, so they hallucinate in denial.,😅😅

jknorrland.bsky.social•11 days ago

📌

jonmills.swns.com•11 days ago

Using the word hallucination is falling for the marketing that is being using to sell these tools. It suggests that they thought up the wrong thing - but they didn't think at all - they made errors or mistakes so let's describe it that way

arbeitology.bsky.social•11 days ago

You should read the article 😉

jonmills.swns.com•11 days ago

You should read the headline 😜 I re-wrote it for you all 'AI companies call them hallucinations - but they are mistakes and they're getting much worse'

arbeitology.bsky.social•11 days ago

I did read the headline, and the article, where your complaint is dealt with - it’s funny we are complaining about computers not being able to read and write properly but here we are skimming information from 6-9 word headlines

jonmills.swns.com•11 days ago

Sure you're right - this is Bluesky after all, I'm not here to argue. My complaint is that most people don't read articles they just read headlines, so headlines are important and the headline doesn't capture the way the word hallucination is used to con people like the article does

jongerhardson.bsky.social•11 days ago

The singularity is already here, we just haven't (d)evolved enough to realize it yet.

asecrettheory.bsky.social•10 days ago

📌

essaywells.bsky.social•11 days ago

One day people will grasp that it's not a question of mistakes or hallucination. When an LLM gives you the "right answer", *that's a coincidence*.

daoist.bsky.social•11 days ago

that's a great way of phrasing it, thank you!

dog-plissken.bsky.social•11 days ago

higher bullshitting rates *

xolotl.org•11 days ago

Friends don’t let friends use the term “hallucination” in headlines about AI https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5127162&__cf_chl_tk=zrKP2f3tptqm4dycqHNXGG0eKVvDFh1VTUWPSb3rO8w-1746460448-1.0.1.1-bO4g9bRGvFjK_o5ZaHI_p.pI7KGrrU9vNXYwmTBsSJ0

techviews.bsky.social•11 days ago

Mirage still sounds way to cool for what it is.

Can I just call it a bad dice roll?

xolotl.org•11 days ago

You could, but there are maybe good reasons not to. We talk about them in the paper.

techviews.bsky.social•11 days ago

Thanks, I like it better than "bad guess" at 71% since it doesn't imply that an LLM makes a conscious effort at guessing, and more accurately describes how the tech works (the data center was warmer than usual that day affecting our pseudo-random seed value).

Thanks for the paper too.

Comments

Posting Rules

Reply