While this is nice to see, keep in mind that they don't currently block third parties, like OpenAI, from scraping text and art posted here. Even if they blocked the bots, the API still makes it pretty easy.
Reposted from
Bluesky
A number of artists and creators have made their home on Bluesky, and we hear their concerns with other platforms training on their data. We do not use any of your content to train generative AI, and have no intention of doing so.
Comments
They're difficult to block as the unethical ones ignore robots.txt rules and can easily scrape from various IPs. And aren't they all unethical?
https://www.businessinsider.com/openai-anthropic-ai-ignore-rule-scraping-web-contect-robotstxt
https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
but yes, you are counting on them behaving ethically and their previous actions have established how unethical they are. Doing nothing, however, is not an option.
A quick question for you--The old blanket disallow:
User-agent: *
Disallow: /
should still be effective for the new generation of LLM AI bots, right? (So far as they respect any robots.txt instruction.)
Many thought the same about Google Search 20+ years ago, but I 100% agree all this should be opt-in. It's a shame that law is always way behind technology. Has there been a story about that I wonder?
https://nightshade.cs.uchicago.edu/whatis.html