Yeah it’s a flaw in how vector searches in LLMs work. They work by proximity not exactness. Good for general topics but they have an accuracy problem when it comes to details.
One of the lessons I learned young is that computers aren't smarter than I am, but they're more stubborn about it--in common with some people I could name.
AI isn't really as much of an advance over a C64 in that respect as might have been hoped, if we're honest.
Suddenly thinking about someone I got into an argument with on here that claimed that they would literally die if they didn't have chatgpt to fill out forms to access help with their disability. I hope they were lying because I'm pretty sure they'd eventually run into issues like this.
Yeah this is gonna be a lie. Using chat gpt to fill out “forms to access help with disability” would be more complex than filling them out yourself because there’s no interface? I also, as a disabled activist, am having a hard time envisioning what forms one needs to file to not literally die
Their claim seemed to revolve around having debilitating ADHD and learning difficulties which prevented them from filling out all the paperwork needed for financial assistance needed to get welfare/benefits & if they didn't get them they'd die from poverty. Which, yanno, suspiciously worded.
AI simps will say literally anything and this smacks as “one of those immutable social goods we all dream AI could provide and so will ignore the fact that LLMs are crap that cause nothing but problems”
I even mentioned, because I had to go through similar troubles with paperwork, that they should probably have a social worker to help them do the paperwork like I did as there'd be no chance of software fuck up & the response was "well I can't get it because that's money nobody has"...
I really needed the help of social workers because I can't do everything myself & I don't think everything would be improved by replacing them with a bot that has trouble getting dates right, I think that'd be a nightmare, so really hope they're lying & didn't get fucked by magical thinking.
Yeah social workers don’t cost money, and while they’re far from reliable in a general sense, if this person were actually petitioning for SSI/SSDI (if they’re in the US, or the equivalent elsewhere) they’d know this, I’m sure? I mean, I’ve been in places where I’ve been trying to get help
And the systems are really fucking inaccessible and complex and there is lots of paperwork but the way this was stated really just isn’t clicking in my mind with the way things generally work. And at this point no form of AI is *any* help with the process whatsoever, hard stop.
I have a security training about AI, and if it isn't one sentence, "Don't use these services, they are fucking stupid" then it will have been a failure.
this only affirms my belief that what AI-as-oracle people want is simply google that speaks in complete sentences, and whether or not it is a liar is irrelevant
Google is typically good at floating dates like this - I imagine it’s because they have a popular calendar app?
Like, does google have a datum stored somewhere that says “Easter Sunday = 3/31/24” that it can regurgitate?
It's entirely possible that the Google one was LLM-summarizing the onebox answer that they would have popped up otherwise.
(Where those come from is ... complicated ... but there's a lot of effort put into preventing egregiously wrong ones from appearing.)
Huh. I don't get that, but I also keep opting out of generative search. I wonder if someone already reported it.
Infobox gets it right.
(OTOH, I have seen the infobox get confused by people with the same name.)
Basically, without access to the exact training data and weights, it's impossible to judge exactly what it's doing and why - which is of course the whole point of keeping them secret
honestly it's the same vibe as SEO, an entire industry built around trying to figure out what's going inside the all-important black box of google search
G has an idea of what ppl want to know about 'easter 2024' and generally that's the exact date because the date moves. Content with the exact date will satisfy lots of their searchers. They have a good idea of terms like Easter 2024 because they know what people were asking about Easter 2023.
there's a bit in the Hitchhiker's Guide to the Galaxy books where a typo in the eponymous Guide made it sound like the Ravenous Bugblatter Beast of Traal was a local delicacy instead of a ferocious monster, resulting in multiple fatalities
That's cause it's using Retrieval Augmented Generation. It first searches for relevant content and then feeds that into its generative AI before it gives you an answer. This helps ground and minimise hallucination.
For a project, I asked various LLMs to compile information that had to be accurate and to list sources. My results (I'd love others' findings):
OpenAI/chatGPT - useless
Claude - good but bad format
Gemini - very good
Groq + Gemma - excellent
Groq + Mistral - v good
Groq + Llama - middling
Thoughts?
I had such a similar conversation with it last week. I wanted it to help me create a schedule for 19 people for the rest of the year. On the first try it was probably 90% of what I wanted, but each time I tried to tweak it the results got progressively worse.
It really seemed to have an issue with holiday dates, even when I specified them (rather than just saying exclude US federal holidays) and after multiple attempts I still couldn't convince it that Friday wasn't the weekend.
Ya, 3.5 is a wonderful combination of really wrong about a many things and really confident. Gpt-4 is still bad at the Easter thing though. I asked it if it ever falls on tax day and it said 2001 (correct) and 2091 (incorrect). I have to tell it to use a script in order to get the right answer.
I've tried having it look at some code I wrote and asked it to make changes and highlight where it fixed something that was going wrong. It highlighted a comment it added and left the executable code alone.
i hate the way these things act like they're "correcting" themselves. no you're not! you're just confabulating again in a slightly different way that may or may not happen to be right this time
I'm perversely looking forward to the point where they double down on their wrong answer and insist it's correct. That's when we'll know they've reached the level of human intelligence.
I got losts of crazy math equaations and other wrongness from CHATgpt 3.5 till it got it right. Not sure if it was still guessing in the most confidently wrong way or not.
Gemini got it right first time - with no equations.
to be fairish - when I switched to GPT4 it got it right first time too. But the free one was spectacularly wrong. Realizing my screen grab is illegible for 3.5 but here is a snippet of it
This only confirms my belief that Easter is impossible to predict and is decided upon by a secret cabal of calendar makers every year. There's no way Chat GPT could figure it out since it doesn't have access to this year's calendar.
This is the problem with the so-called Turing Test. (Which I don't think Turing ever thought of as a test.) Imitating human intelligence is a lot easier than actually implementing it.
It means that if the thermostat in the room housing the 1000 monkeys typing on 1000 typewriters is set too high, the monkeys will get heat stroke and hallucinate incorrect responses
Got the same garbage out with V3.5. Temp set high produces more "creative" responses. Set too high it spouts gibberish. But can see here temp has nothing to do with it. This is just a failure of the model. Good catch.
Just a reminder that ChatGPT is an algorithmic program that relies on user self-inputted responses and data, meaning anything it spews is whatever BS someone else threw at it, so at this point with the amount of user it has, it’s not going to be right for a majority of the time.
What's the advantage of this shit that gets stuff wrong all the time? I mean, it's easy enough to Google "future Easter dates" and find something accurate. This AI shit is going to pollute the pool of correct data.
the wildest thing to me here is that i don't think it's just drawing random easter-probable dates from its training set bc these dates are all actually sundays in 2024. really impressive how specifically wrong it is
right, that's what i'm saying, i expected it to just pull random late march and april dates from instances of "easter sunday is [date]" in its training set, but all the answers it's giving me for "when is easter 2024" are plausible but incorrect SUNDAYS in 2024, so something more is going on here
It’s never going to get it right if its date for the paschal full moon is off by 12 days. (Bede and everyone else involved in perfecting the computus wept)
extremely cool that my very fav band @themountaingoats.bsky.social has reskeeted this, extremely uncool that it has escaped containment and i am getting my least fav response to AI fuckups from AI fans, which is "oh, this is corrected in the latest version of chatgpt that you have to pay for"
The Mountain Goats AND calling out AI’s bullshit in the same thread seems like a pretty cool niche Venn Diagram of ‘things some of us like’ that I am pleasantly surprised by.
Comments
AI isn't really as much of an advance over a C64 in that respect as might have been hoped, if we're honest.
So Skynet will actually have Super Soakers and bifocals...
So many companies think they can replace their staff with chat GPT.
SuncrushAI
Like, does google have a datum stored somewhere that says “Easter Sunday = 3/31/24” that it can regurgitate?
(Where those come from is ... complicated ... but there's a lot of effort put into preventing egregiously wrong ones from appearing.)
Of course, sometimes google completely blows it:
Infobox gets it right.
(OTOH, I have seen the infobox get confused by people with the same name.)
It frequently feels like coming in halfway through a game of Mao - the inputs and outputs are visible, but they don’t always seem related.
https://bsky.app/profile/phyphor.one-dash.org/post/3knxrotc2k22x
there's a bit in the Hitchhiker's Guide to the Galaxy books where a typo in the eponymous Guide made it sound like the Ravenous Bugblatter Beast of Traal was a local delicacy instead of a ferocious monster, resulting in multiple fatalities
OpenAI/chatGPT - useless
Claude - good but bad format
Gemini - very good
Groq + Gemma - excellent
Groq + Mistral - v good
Groq + Llama - middling
Thoughts?
Also, maybe it got in my head because I was definitely late to work today because I woke up and thought it was Saturday.
Seems like GPT4 would give me the exact answer.
It's trying so damn hard to please you, look at it wag it's little tail.
I don't make the rules.
And the future is...
Gemini got it right first time - with no equations.
"Ship it!"
https://bsky.app/profile/torrleonard.bsky.social/post/3kqg4f6qaez22
2025 is going to cost you extra.
Oh. oh no.