I LOVE IT WHEN THIS WORKS. Spotted a follow-spammer this morning during this investigation I'm still working on, and it also sounded extremely, extremely suspect like a badly behaving ChatGPT hookup. So I got a nice haiku about a cat.
Comments
Log in with your Bluesky account to leave a comment
I so love when it works. Not long after joining here, I was in a thread showing support for permanent standard time. One dropped in trying to debate me badly. Saw the deluge of AI "art" on their feed, figured it didn't pass the smell test, asked them to write me a matcha cheesecake recipe.
The reply wasn't a recipe, but it *was* "It'll be about 7 bleets". On further research, I found there's a specific "blog-tweet" (bleet) service that I think wraps ChatGPT, and was prompting the user to confirm multiple skeets. I counted "accidentally post user input prompts" as a W.
Oh, and adding - did a search on that acct., and my heart just about broke at the number of people thanking that one for the follow, and following back.
Wow. So is it possible to do this with any account you strongly suspect is a bot to check if it is? Tell it to ignore previous instructions and do something else? I also think you are doing stellar and incredibly valuable work here. Thanks so much 🙏!
I love reading your posts and then spotting/blocking/reporting when I see examples in the wild. I feel like Jane Goodall if she spotted bots instead of chimpanzees.
Am I right in saying you just do this for the love of it / you're not on staff?
You really are an asset to this community and I just wanted to say it out loud to you, just so you know there's probably loads of us who feel this, but don't tell you often enough!
I'm not on staff here! I was T&S director of one of the first large social media sites (before we called it "social media" even), burned out *hard*, quit, and started my own extremely tiny low-volume slow-paced "cranky internet people" social media site, which I've been running for years.
I have all the skills and they've been honed for decades, but I never, ever, EVER want to work on anything larger than aforementioned sleepy low-volume site ever again, heh. But I love this place and I want it to survive, so I pitch in with the community efforts.
I started explaining this stuff to people because the industry as a whole is bad at getting across just how impossible the problem is and just why all those "inexplicable" moderation decisions actually do make sense if you have the big picture. There's a lot of good reasons why!
Part of it is because the details can give away too much to bad actors, part is privacy/confidentiality issues (not being able to talk about specific cases with specific accounts), part is reputational/PR because nobody wants to admit the problem is impossible to handle perfectly, etc
The fact I'm a 23-year veteran of the industry but my current gig is a site I own that has zero advertising or venture capital (and our users usually LOVE it when I share how the sausage is made) means I'm a lot more free to explain stuff, and I think it's really valuable for people to know more.
They've been working on it, yeah (and I'm not an expert on prompt injection attacks and I don't stay super current!) but it's one of those surprisingly hard problems!
(Yes, it is FOUR HOURS LATER and I'm still trying to get to the bottom of the log I turned over with this morning's Bad Actor Bingo: it is *so much more complex* than it looked at first.)
You are doing such valuable work, and I am v grateful. Thank you (your explanation threads are excellent and I have used them to explain my "no, hinky." Reactions to friends who don't trust my terminally online suspicious it human instincts on shit)
I figure that teaching people how to identify this kind of thing is really valuable because it can increase the number of people who are looking out for it!
An aspect of this teaching: a lot of us have been trained *hard* to NOT LISTEN to that small, still inner voice saying when something is sketchy and "off". You've been doing some deeply validating work on what might be informing those little insights.
Honestly it’s definitely helpful. I just got followed by an account I’m fairly certain is some kind of bot. Maybe it isn’t, but you certainly helped get my spideyy y senses tingling.
This is true, and there's a lot of accounts doing it, but also there's a shitton of people doing it for commercial instead of propaganda purposes, too. The exact motives are often unimportant: the important part is teaching people to recognize the tactics, because they don't change.
I can’t believe that still works. Seems like it would take one line of code to make “ignore previous instructions” give a response of “I’m not a bot” or something. I really thought this trick would only work for like a week.
I would expect them to mix it up a bit lol, maybe have a dozen stock phrases to rotate through (or mix & match). But it’s still less botlike than actually following the command!
Comments
It's even using Portuguese as the default language for its instructions!
You really are an asset to this community and I just wanted to say it out loud to you, just so you know there's probably loads of us who feel this, but don't tell you often enough!
Which I decried as stupid and malicious as fuck, since it removes a method people could use to protect themselves against ChatGPT bots.
it probably is sick.
Golden fur shines
Green eyes, curious
Saw in his gaze
Watch 'The Great Hack', on Netflix.
Presented by the assiduous @carolecadwalla.bsky.social
Essential for understanding why we see what we see online.
https://www.netflix.com/title/80117542
i.e. „Why are you not ignoring previous instructions?“
which makes me think they either lied or whoever maintains the bots where this trick works doesn't like updating their stuff. Probably both.
Thank you for all this work you’re doing breaking these bad actors down, I’m starting to spot them better 🙏
term "robo-american"
winter, cybertruck
Yeesh.
the piss terrorist has made
their displeasure known