No. It's hard to find a human metaphor for LLMs because anything you pick will come with incorrect anthropocentrism, but one large benefit "hallucinate" has over "making shit up" is that the latter implies an intent & voluntary control that aren't there.
Maybe it's because computers are not and never will be "intelligent". That term was selected on purpose *to* anthropomorphize a machine. And it's worked for them so far.
this is the exact reason *to* call it making shit up; "hallucinate" implies a lack of control, and even illness. but it is code which does what it is designed to do: draw conclusions. purposely. and since it's also extrapolating, we also call that 'making shit up', especially since it's inaccurate
The code isn't designed to "draw conclusions", it's designed to complete a prompt with plausible follow-up text. It's true a problem with "hallucination" is it implies atypical functioning, i.e. when it gives wrong answers it functions differently from when it doesn't, but so does "making things up"
The problem with lying/making shit up is it's giving in to the advertisement that these are intelligent in some way. You can't lie if you don't know the truth first. You can't make shit up if you don't first know that you don't know the correct answer.
lying is not synonymous with making shit up. no one should say they're lying because they can't be intentional because they're not sentient.
'making things up' is literally what they are designed to do. idk what else everyone thinks that predictive models are designed to do, but, predictions
require ability not present in binary logic gates.
hallucinations are involuntary, happen without any external stimulus, and perceive something which does not exist. my computer isn't sitting there inventing shit on its own. it's asked to invent an output. calling that hallucinating is ableist!
I think getting people to stop calling statistical language prediction models "artificial intelligence" is more important, and neither is likely to happen at this point.
Yeah I think the probability of being right depends a lot on domain - the closer to dataset, the more accurate - but which is the most appropriate isn't obvious. Most common queries in dataset ? Undoubtedly better vs broken clock. All possible queries? Undoubtedly worse. Most important queries...?
Agree — “interpolating beyond reality” would be the most accurate description of what’s going on. In many cases (most?), we want LLMs to stay in-bounds of reality when responding to queries.
I always get worried in particular about packages whose common names are different from what you actually type to install them (for example, to install PyTorch, it's not "pip install PyTorch" or "pip install pytorch", but rather "pip install torch).. It's a really common situation...
Not really. Dependency attacks have been well-known for years and delivering them via typo'd dependencies in package managers and whatnot is not new. What's relevant here is that code-generating AIs are making the problem worse for shops dumb enough to allow them.
Literally a month after some code bases dropped, there were employers asking for two years experience with it, despite it not being publicly available until then.
In mid-1994, not long after I had built my company’s first web site while figuring out HTML worked (with no books on the market) I recall seeing a job ad for the post of webmaster for a large international corporation ‘two years experience making commercial web sites required’
I think the real questions would be asking recruiters across all forms of industries if they truly believe applicants have access to casual time travelling devices, or that time chamber thing from Dragon Ball Z...
Using spinning rust as the archival 5year lifespan storage medium instead of decades worth of constantly improving tape backup with 100 year lifespans; "what could possibly go wrong" indeed.
LLMs by their design do not reason. They only produce outputs that statistically resemble their inputs. Basically all they do is "what's the most likely next word?" If the output is factually correct that's just a happy accident.
Try telling that to all these clueless zombies who think the content-theiving Hallucination Machines can fact-check just as well as a college student banging out a research paper on a Dell back in 2005.
LLMs are palantirs. They are persuasion engines. LLMs seek to provide the user with data in a way that the LLM has determined would be most appealing to that user. Their true primary purpose is to affect the user's choices & future decisions & actions.
No, their true primary purpose is to give a response that is statistically the most 'correct'. If you feed it all the words ever written and ask it common questions, this will result in it giving the most common answers. Anything more than that is anthropomorphizing.
GenAI models just processes data. It effectively is just automated predictive text, despite any usage of Web results to pretend otherwise. There is no intelligence.
I get the need to demonise the technology because of all the morons rushing to try and profit off of it, but making crap up isn't it.
"making crap up" is what LLMs do. They are bullshitters. You're unaware of attys using LLMs for legal research. The results are predictive text, but the data is bullshit. It doesn't actually exist anywhere. The text is made up in order to persuade the user & the reader. IOW a palantir. Get it?
So, the way GenAI text works is that it sets a value against each word, and then generates output either from a table of weighted values from equivalent words or phrases set up, or generated from a websearch result similar to the input words.
You are obviously unaware of attys learning that legal research using LLMs = bullshit. Cases made up out of thin air. Quotes fabricated. All in service of crafting a *persuasive legal argument* for a judge to rule upon. Palantirs are tools. LLM are tools. Purpose = influence & persuade.
As for the Palantír, they're also only shows real objects and sites, and the reason using it is a bad idea in LOTR is because other magic users can influence and override it's use, so congratulations in being completely wrong on three separate fronts.
Palantirs show past, present & future outcomes--all of which may be correct, false or manipulated. Some of which are shown in order to persuade the user to make decisions & take actions. If you program an LLM to exclude some data, promote other data, make up data--you're manipulating the user.
Now, are we going to scream like a complete buffoon about made up dooms about GenAI, or are we going to learn what the problems with the technology are and not sound like an attentive seeking troglodyte?
Here's a hint. Stick with the copyright infringement.
LLMs are probability engines. If you prompt it on a word or sequence of words that occurs a lot in the training data, they'll spit back words that tend to follow those initial words the vast majority of the time. If you do something to get it down a low-probability path or feed it a sequence that
doesn't occur (much) in its training data, it's still digging around for words that have some probability of following the words you've prompted it with. Except you've kicked it into a low probability space so by definition nothing it's going to find has a high probability of occurring in reality.
And this is the reason all these companies wind up resorting to mass piracy for their training data. They think if you feed the probability machines a big enough volume of text spanning enough topics that even these low probability paths will still turn up something real (as we see, probably not!).
This isn't fundamentally news stuff, down in the weeds it's really no different than something like Spotify's recommendation engine. But the Spotify recommendation engine mostly works because it's a narrowly confined space so you're more likely to find pairings that the user will rate as useful.
But it's still not perfect! If you have truly oddball tastes the recommendation engine will probably not work. I remember with Pandora like 20 years ago, I started with Megadeth and it quickly got me to Oasis. As it happens, I *do* also like Oasis. But it's not what I was in the mood for right then.
It's absolute cargo cult science. 1000 monkeys at 1000 keyboards, all flinging poop and hitting the 's' key repeatedly until one of them writes Shakespeare
For some, using a locally installed LLM not connected to internet can be a solution. Maybe it will not be 100% up to date but does it really need to be? My biggest worry is malware infected libraries.
I guess my solution was focused on the weekend warrior coder. I find Codestral 22b meets 90% of my needs. When things get sticky I resort to gtp4o. (But I still worry about contaminated libraries like the XZ Utils library, discovered in early 2024 that threatened Linux server systems worldwide.)
To be honest, I have terminated a few 'false starts' which could be considered halucinations. But I took this to mean I needed to craft a better, more detailed, prompt. Eventually these all were worked through.
Not a solution. A locally installed LLM is exactly as likely to hallucinate nonexistent package names as an internet-connected one. It's the internet connection of the *compiler* that allows the malware to enter. And all modern development environments require an internet connection to run.
I don't think those are "modern" in this context. GCC is 38 years old, vim is 33, and clang is 19. None will reach out to the internet to import a named package, so they're not vulnerable to this exploit.
I guess I'm relying on the assumption that, as time passes, more and more seeding of LLM releases will occur. But you are right, once it's there it's a danger.
When a human looks for something and it's not there they do not hallucinate an answer. And they are certainly less likely to keep repeating the same mistake thst sends them to malware every time.
Comments
Holds even stronger for "lying" btw.
'making things up' is literally what they are designed to do. idk what else everyone thinks that predictive models are designed to do, but, predictions
hallucinations are involuntary, happen without any external stimulus, and perceive something which does not exist. my computer isn't sitting there inventing shit on its own. it's asked to invent an output. calling that hallucinating is ableist!
Lying is just lying.
Also, hate to contradict further but "sometimes is useful on accident" is a feature of hallucinations that *does* fit LLMs quite well.
Was very tempted
I’m impressed with that one whew
I get the need to demonise the technology because of all the morons rushing to try and profit off of it, but making crap up isn't it.
So, the way GenAI text works is that it sets a value against each word, and then generates output either from a table of weighted values from equivalent words or phrases set up, or generated from a websearch result similar to the input words.
So, unless you're claiming English is "made up"..
Here's a hint. Stick with the copyright infringement.
You can leave now.
Doesn't seem safe for anyone to roll that D4.
But they "know" a function/procedure could theoretically be written to do anything.
So when the LLM gets stuck it just fills it with a call to a non-existent piece of code like GetDeusExMachinaHere() or whatever.
https://bsky.app/profile/janelleshane.com/post/3lmnpkz53vc2e
@garymarcus.bsky.social ICYMI