We recently shared Bluesky’s stance on user data and AI training, which has not changed. Bluesky will not train generative AI on user data. https://bsky.app/profile/bsky.app/post/3layuzbto2c2x
Reposted from
Bluesky
A number of artists and creators have made their home on Bluesky, and we hear their concerns with other platforms training on their data. We do not use any of your content to train generative AI, and have no intention of doing so.
Comments
https://bsky.app/profile/techhelp.live/post/3lc4hbpwois2f
Bluesky won’t be able to enforce this consent outside of our systems. It will be up to outside developers to respect these settings
Quote posts and reposts not allowed, but original content could be. Perhaps the option is only available after 24-48 hour wait.
Default is always ‘nope.’
They can't outright ban ai the same way the www protocol cant outright ban ai
Can you please bring back invite codes?
https://huggingface.co/datasets/bluesky-community/one-million-bluesky-posts/tree/main
Hot take but I think they should have never made the platform totally open. Invite codes make it much easier to prevent bad actors from joining, and since generally bad actors will invite more bad actors it's easy to "prune" an entire bunch of bad actors at once
Are we all absolutely sure Bluesky is real and we aren’t all suffering from some sort of shared delusion?
Keep it all simple, and keep Ai away from the platform. Let the users create the content we will post, and Bluey will thrive.
To prevent BlueSky from getting millions in fines if there is a privacy violation around user data
Anyone of the billionaires will can do that to manipulate the safety of bluesky.
First: I hope that you're not just working with lawyers and engineers. Involve artists in the conversation too.
Third: PLEASE have it be opt-in and NOT opt-out.
https://bsky.app/profile/priva.cat/post/3lbvdx55y7c2o
Not being fed to AI is the whole entire reason people want Bluesky
- Bluesky is not using generative AI
- Bluesky IS using AI to enable moderation of content individual users choose to see
- Bluesky can discourage, but ultimately cannot physically stop others from using your content for genAI.
Why are people upset by this?
It’s like watching someone from the Middle Ages say the printing press is stealing artworks when it’s actually printing books to market you better as an artist.
any LMM that is training off social media information is going to be cheap.
ATProto (what bluesky is built on) works under the assumption that all data is public and all data goes through the firehose. What will someone do with all that data is their business, the data is public
It's the same thing with the web too
Terminology matters
Tho I can see a benefit of having a robot.txt system for users to opt out. A website is open but you might not want Google indexing it. Same here I suppose
Since you can actual host your own Bluesky node on your own servers, they can't stop you from allowing access there. They have said they will not allow it on the main Bluesky servers.
There may be copyright or other barriers, hence BSky talking with lawyers, but likely not a ToS issue.
https://www.404media.co/someone-made-a-dataset-of-one-million-bluesky-posts-for-machine-learning-research/
The most tampon you could have is closing off access to programmer interfaces ("succinct handy ways to access data") with the actual bigger result of enshittifying the platform like happened with reddit.
I just tried to post a couple of videos, and was informed that videos could not be longer than 60 seconds.
Too many restrictions.
#Bluesky
Been making some #powershell #websocket experiments with the site, and want to ensure that any user can easily ban ai training / scraping.
https://europa.eu/youreurope/citizens/consumers/internet-telecoms/data-protection-online-privacy/index_en.htm
I applaud blueskys efforts here.
Compare this to email, any given SMTP Server doesn't have to respect anything you request. The same is true for the AT Protocol that underpins bluesky.
The world is still vastly mentally and culturally not prepared for this technology in all the ways it will be used.
Always as a weapon first.
This truly isn't about BlueSky's ethics. This is entirely about the ethics of other companies.
Also don't be a fucking dumbass and use the thread function y'all implemented. It's useful.
UPGRADE PROCESS HOW GOING ON
(and glaze, because that's how it started) https://glaze.cs.uchicago.edu/
Glaze to protect yourself, NS to attack the generator. <3
(Also Jason, any chance you've got some kind of writeup of emerging best practices or something along those lines?)
https://nightshade.cs.uchicago.edu/whatis.html
it's also helpful for others to see the response so thank you for linking it!
https://bsky.app/profile/pdj.bsky.social/post/3lbwn7i7lfk2m
we dont need more flags or features, we need to stop ppl from doing AI to begin with.
i do think they should have people on the https://bsky.social team taking action against scrapes of data from here though
https://www.organdonationalliance.org/insight/opt-in-vs-opt-out-donation-systems/
This would at least give users legal recourse to challenge unlawful scraping.
Simply tracking what they do, steal, take would be helpful as soon as the Rich start suing for this kind of theft: a good basis for class action lawsuits.
They won’t respect anything but profits
Can you modify the AI protocol to ban access from them or add on some proprietary code to prevent it?
https://bsky.app/profile/anamatoso.bsky.social/post/3lazb3zno3c24
They'd absolutely need to make it opt-out by default otherwise they'd piss off a LOT of people, but i'd be surprised if they didnt
Lure artists to your platform with a promise that you won't use their art to train AI, which you KNEW most artists would interpret as "your art is safe here", then turn around and serve up their contributions as a tempting treat for third-party AI developers
Keeping everything "public" and "open", or protecting the interests of the users who create all the content? You're going to have to make a choice.
we can only hope the orcas feed on their yachts
A simple pre-filter on the firehose API removing posts from individuals who opt-out seems like a pretty simple solution.
https://bsky.app/profile/sarafen.bsky.social/post/3lbvf5tpxtt2n
(I’m not affiliated with this company. It was just one of the top results from searching Duck Duck Go for: websites block ai crawlers.)
Banning platform-wide may reinforce AI bias, as right-wing sites tend not to block these crawlers: https://www.wired.com/story/most-news-sites-block-ai-bots-right-wing-media-welcomes-them/
Letting users decide would fit Bluesky ethos of choice and respect for individuals.
Right ..
... Honour System.
✨
🥷
🙏
They don't have control over what outside companies do.
Is it that hard to set up anti-scrappers? Twitter couldn't do it, so I assume?
In October, LinkedIn learned it the hard way, paying €310M. Diff: they provided information to known parties, Bluesky provides information to unknown parties.
https://www.dataprotection.ie/en/news-media/press-releases/irish-data-protection-commission-fines-linkedin-ireland-eu310-million
- i’ve blocked @external-scraping-allowed.bsky.app -> I don’t want my posts being used
- I’ve added @specific-scraper-bot.scrapingsite.com to my “robots.allowed” list ➡️ I’ve given consent to that bot
So don't give us the "we can't do anything If THEY don't follow the rules"-excuse.
OR... Before you allow them in, you allow us to put locks on our Accounts. And i'm talking clearly ahead of time so we can prepare.
Bots, datascrappers etc. Generate Traffic. They generate A LOT of traffic. They generate more traffic in a day than any human being in a year. Even if they are terminally online.
If they had their own white-list (Bots that are allowed to Farm Data) they could technically Block/Ban the Bots.
Not the best solution. But better than nothing.
Simple I saved you a bunch of time
If Bsky was able to enforce consent, then this is centralized, as it would have to ban other instances from connecting and storing content.
I'm not trying to incite discourse or discord. I want some definitive answers rather than a "we're looking into it, chums!" kind of thing. What are your intentions? How'll you protect us?
Scrapers don't abide by the txt file.
Not tell you how to run this site, just know amongst content creators AI is a dirty word.
I don't want my posts and artwork to go to train some rotten techbro's ripoff engine.
(Fuckface) @hf.co is scraping our data (literally over a million posts) and they conveniently created a starter pack with their employees. I would like to block every fucking one of them easily.
So they can still collect and train.
I wonder if there's a way to tamper with their stuff though. That would be fun.
Obviously Bluesky can't magically block all bots from reading text on the site, but there's not much need in late 2024 for any kind of wiggle room as far as giving free data to generative neural networks.
The only way to get the protections people are demanding here is to abandon open platforms. Go on mastodon, have an invite only platform
If a user publishes content that they don't have the right to, that content can end up being used as training data. Where abuse can happen, it will.
This isn't a matter for fence-sitting. Either commit wholeheartedly, or don't, and users can react accordingly.
Unfortunately, the same way that a third-party app can read posts - so can any bot. And there's no API keys involved, so no way to restrict access.
It's a double-edged sword.
Right now, because it's so open, there's no way Bluesky can be bought by someone and shut down. Adding restrictions to beat AI scrapers would break that.
IT WAS TAKEN ANYWAYS.
You have to stand against evil. This is evil
Their User agents thankfully are publicly known, so just add those to a blacklist so they can't ever request content in the first place!
Legit, the scraper from ByteDance is know to send so many requests, it can DoS smaller webservers...
Getting annoyed, how often I misspell things.
One way to optionally block 3rd party scrapers would be to make it to where you can set your account to only be viewable by logged in users. Toyhouse is practically an artist's safezone/heaven, thanks to having a feature like that.
Thx
@bsky.app
Artists should be taking defense of their work into their own hands, regardless of the platform. Glaze or other anti AI services etc. Artists need to use multiple forms of protection.
(Not an endorsement of scraping/training)
https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/#how-we-find-ai-bots-pretending-to-be-real-web-browsers
Maybe Bluesky needs to get in touch with Cloudflare to figure out how they block scrapers masquerading as normal users?
what does work though is legal action, hence robots.txt - its not just about good faith, it's about "bitch i said NO"
Copyright still applies.
Note that I did not use the modifier "security".
The fact that you even suggest that makes me think that you too are slowly switching lane just like any of your predecessor had done. I really hope I'm wrong but I'm not naive.
And yet you just partnered with ROOST.
Sounds like either someone lied, or you just found someone with a big enough bag of money to make you change.
Not cool.
Waow so cool amazing bluesky is home 🥹
Go into your own profile, click the edit profile button, then see after clicking see if the menu dropup "Update from Library" appears
I will never post my art here
They still allow third parties to take that data and images. This app ain’t any better
This platform is better for caring, and for not using us as free training data, unlike Twit.
A platform with an AI built in, wanting to train on your posts, vs one without that is trying to help.
At least here they are looking into ways to protect us and don't have a conflicting interest of wanting to train on our works, compared to over at Twitter. (and not headed by an insane dude)
https://docs.bsky.app/docs/advanced-guides/firehose
Only here. In the 1200s
#NichirenDaishonin proposed - simultaneity of cause & effect & nonlinear time by study of #Buddhism & #SelfRealization #AI fosters #Truth #NichirenShoshu Priesthood preserves the teachings. Grasping our #universe is the goal. doctrine at
https://rumble.com/c/BuddhistStudy