An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates. Experts say the problem is bigger than that
Comments
Log in with your Bluesky account to leave a comment
Those hallucinations have lead to an increase in ‘pseudo-apologies’ from my recent interactions with Chat GPT as I highlight fundamental mistakes.
Maybe so much human feedback highlighting mistakes is too ‘emotionally painful’ for the AI algorithms to cope with, so they hallucinate in denial.,😅😅
Using the word hallucination is falling for the marketing that is being using to sell these tools. It suggests that they thought up the wrong thing - but they didn't think at all - they made errors or mistakes so let's describe it that way
You should read the headline 😜 I re-wrote it for you all 'AI companies call them hallucinations - but they are mistakes and they're getting much worse'
I did read the headline, and the article, where your complaint is dealt with - it’s funny we are complaining about computers not being able to read and write properly but here we are skimming information from 6-9 word headlines
Sure you're right - this is Bluesky after all, I'm not here to argue. My complaint is that most people don't read articles they just read headlines, so headlines are important and the headline doesn't capture the way the word hallucination is used to con people like the article does
Thanks, I like it better than "bad guess" at 71% since it doesn't imply that an LLM makes a conscious effort at guessing, and more accurately describes how the tech works (the data center was warmer than usual that day affecting our pseudo-random seed value).
Comments
Maybe so much human feedback highlighting mistakes is too ‘emotionally painful’ for the AI algorithms to cope with, so they hallucinate in denial.,😅😅
Can I just call it a bad dice roll?
Thanks for the paper too.