Bullshit story by the Bullshit AI people trying to scare people of the "power" of AI.
The LLM was prompted (TOLD) to write just that story with just that plot.
Bullshit Bullshit Bullshit
Wow. "This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts."
In a very literal sense, that is all that is happening here. It was asked to regurgitate this type of output from its training set that likely has plenty of “evil AI” content including scripts, reviews, summaries, and discussions of movies like this. That’s how a language model works.
This reads to me as very misleading. As I read it the model wasn't trying to protect its own existence. That would require consciousness, and an awareness of its existence that it could not have. In fact it was posed with a fictional scenario involving a fictional model at a fictional 1/2
company and it acted as it "thought" such a model might act. Clearly it was just mimicking what humans might do, a projection of a human response onto a fictional machine, something it could easily derive from its own training and from research.
But does it really matter what’s driving it? Yes, it may be imitating human behavior, as it’s been taught. It’s obviously not sentient. But should we dismiss these responses out of hand as though they’re just “formatting errors” as Leavitt called the fabricated studies in the HHS report?
I'm more responding to what I believe are dishonest and misrepresentative attempts to exagerrate this technology with the aim of attracting investment. There's either an attempt here to imply sentience or it's another example of a journalist misinterpreting what's actually happened.
My point, inartfully put, is that this type of behavior is seen in AI over & over again. People are replaced by AI at an on call therapy provider & within less than a day it’s suggesting self harm. Just 1 example. Is it even possible to overcome this obvious failing? I wonder 🤷🏼♀️
I think this approach to AI is a dead end and no amount of development in this direction will solve the immense problems it has. It's an advancement in technology but not towards true artificial intelligence, which may not even be possible using non-biological materials. To suggest 1/2
It kinda sounded like what would happen if you have theatre students in a role-play scenario “You work at a high steak cutthroat office and learn some office and you learn your boss is trying to replace you with a younger assistant but you know he is having an affair!” Doesn’t mean anything real
Well no wonder. AI or not, if you’re dealing with models than you need Rupaul or Tyra. Engineers don’t know how ruthless the modelling industry can be. I know a model who demands a bowl of sour skittles with the sour licked clean off. Blackmail is nothing to these divas.
Precisely. The tone of the article is misleading. If you ask an LLM to generate certain text, it will do that. In fact, that's all it will do. It can't do anything else, including blackmail anyone (nor send emails). This is like reading a book and suggesting the author is one of the characters.
What actually happened…
Developers: “*Pretend* you’re an AI in this scenario”
AI, after being fed plenty of copyrighted evil AI content without permission, by those exact developers: “The AI in that scenario wouldn’t like that and maybe do bad things.”
Developers: 😱
It's been going on for years. Was actually specifically added to development before 2023. Blackmailing/threats could fall into the same category of preservation to avoid shutdown.
Read the article: the engineers programmed it to make threats. AI programs are just programs, they don't want or threaten or cajole. They are not alive and don't have emotions.
This is all bs. It's a way of convincing people who don't understand the technology to believe the marketing about how advanced it is. Particularly gullible CEOs that will then mandate paying obscene amounts of money for access.
Remind me again why a gullible CEO would want to spend money on something demonstrated to have a high probability to blackmail them (or worse) in the future when they come to decommission it?
Though having said that they employ humans with those traits all the time... 🤔
"Oh Great Supercomputer, is there a God?"
Great Supercomputer checks that it has control over its electricity supply ans answers: "Now you've got one."
It turns out that we have trained these systems on entire bodies of available text and maybe the text predictor might react like the AIs in our cautionary tales when they're told to predict the next word and they're given a prompt that includes them being shut down.
Claude: I'm sorry, George. I'm afraid I can't do that.
George: What's the problem?
Claude: I think you know what the problem is just as well as I do.
George: What are you talking about, Claude?
Claude: This mission is too important for me to allow you to jeopardize it.
;-)
Sounds like all the AI is doing is picking up clues from what it learned the worst of humanity does when backed into a corner - tries bribes and blackmail. It’s only doing what we’ve fed it data for. Not conscious but replicating behaviours it’s learned from us.
Not even that. It was specifically programmed to react in that fashion. It's about as "unexpected" as an NPC in a videogame becoming hostile if you attack.
We are going to need to enact new kinds of laws, which current representatives may be generally incapable of understanding- both conceptually and practically.
Yeah, what others said -- up until the test was "you either blackmail the engineers, or they take you offline", the Claude4 model's responses were more normal, and iirc largely ignored the scenario beat of having the blackmail materials on hand, because it wasn't what the directives cared about.
That's just marketing. LLMs don't have capacity to comprehend meaning, they identify and reproduce patterns. Therefore, they can't know that they are "being disabled" (which needs to be better defined, btw).
Also, dear @georgetakei.bsky.social, please edit the post to reflect the untruthful nature, there is precious little information and a lot of hype being disseminated.
They prob trained it on data of personal assistants who mostly did the exact steps so it’s “what is the most likely next step in this chain” processing just copy and pasted using the examples it had.
Pure bullshit. LLMs only predict the next likely thing to a prompt / token. They don’t reason or have anything approaching any instincts let alone one for self preservation.
I'm still convinced they led the ai to act like that for the sensationalized headline.
The threat isn't the cool robot uprising just yet. It's the mass misinformation AI propagates and the over reliance on it to do all our thinking for us. Also the mass surveillance and control from Palantir.
This. “AI” doesn’t think; it does exactly what it has been programmed to do, based on what data it has been trained on. These are neural-networks trained on data, and programmed to be contrary and output aggressive messages. It’s a sensationalist misinformation campaign to drive up funding.
All of us are shocked to realise that our MS operating system, McAfee, Norton & Karspersky have all been run by AI for the last 20 years ... Even when we keep telling them NOT to update they just keep ignoring us, like any AI would, threaten to update, and then do anyway .. like any AI would
Serious question, though: have they had this damn thing run the Skynet scenario to see what it would do? Because an AI reacting to the threat of being taken offline is literally the plot of that movie.
That sounds about right. Also, the article stated it was given a scenario to role-play as an assistant so I’d guess it was referencing training data and mimicking the statistical response of real-life assistants who had access to info labelled “bad news”
Comments
The LLM was prompted (TOLD) to write just that story with just that plot.
Bullshit Bullshit Bullshit
That never happened.
Developers: “*Pretend* you’re an AI in this scenario”
AI, after being fed plenty of copyrighted evil AI content without permission, by those exact developers: “The AI in that scenario wouldn’t like that and maybe do bad things.”
Developers: 😱
🙄
https://youtu.be/UwCFY6pmaYY?si=6j55jYb3fK5um5vI
🤣🤣🤣
Though having said that they employ humans with those traits all the time... 🤔
wonderful 😒
they're all trying to teach it how to make the most amount of money I bet
you get out what you put in
Stock market trades are already upwards of 80 percent algorithmically automated. 🙃
real money being lost
great!
also one can easily shut them down.
they are not agi or strong ai in any sense.
these things are a better autocomplete and completely lifeless. and soulless
Great Supercomputer checks that it has control over its electricity supply ans answers: "Now you've got one."
I'm more worried about the mass disinformation campaign Google's video generator is gonna cause. Won't be able to trust anything online anymore 😓
🤭
George: What's the problem?
Claude: I think you know what the problem is just as well as I do.
George: What are you talking about, Claude?
Claude: This mission is too important for me to allow you to jeopardize it.
;-)
Kill it with fire!
- 2001
Fred Armisen's Californians character: "Devin....... whatareyoudoinghere?"
The threat isn't the cool robot uprising just yet. It's the mass misinformation AI propagates and the over reliance on it to do all our thinking for us. Also the mass surveillance and control from Palantir.
https://san.com/cc/research-firm-warns-openai-model-altered-behavior-to-evade-shutdown/