While this is nice to see, keep in mind that they don't currently block third parties, like OpenAI, from scraping text and art posted here. Even if they blocked the bots, the API still makes it pretty easy. - ThreadSky

clarkesworldmagazine.com • 188 days ago

While this is nice to see, keep in mind that they don't currently block third parties, like OpenAI, from scraping text and art posted here. Even if they blocked the bots, the API still makes it pretty easy.

Reposted from Bluesky

A number of artists and creators have made their home on Bluesky, and we hear their concerns with other platforms training on their data. We do not use any of your content to train generative AI, and have no intention of doing so.

Comments

alexharford.bsky.social•188 days ago

How do you block the bots?

They're difficult to block as the unethical ones ignore robots.txt rules and can easily scrape from various IPs. And aren't they all unethical?

https://www.businessinsider.com/openai-anthropic-ai-ignore-rule-scraping-web-contect-robotstxt

clarkesworldmagazine.com•188 days ago

I've been maintaining a list of things people can do at:
https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
but yes, you are counting on them behaving ethically and their previous actions have established how unethical they are. Doing nothing, however, is not an option.

lostdiarist.bsky.social•174 days ago

Hi Neil,
A quick question for you--The old blanket disallow:

User-agent: *
Disallow: /

should still be effective for the new generation of LLM AI bots, right? (So far as they respect any robots.txt instruction.)

clarkesworldmagazine.com•174 days ago

The answer is a bit more complicated. Depends on what you are trying to do. Your solution tells all bots (even search engine indexing) that they aren't allowed. If that's intended, there are more effective methods than robots.txt for keeping them off your site. If not, there's more to do.

lostdiarist.bsky.social•174 days ago

Thanks. I have some research to do!

alexharford.bsky.social•188 days ago

Thanks, Neil. You provide a brilliant service to the writing and reading community and way beyond.

Many thought the same about Google Search 20+ years ago, but I 100% agree all this should be opt-in. It's a shame that law is always way behind technology. Has there been a story about that I wonder?

larrypf.bsky.social•188 days ago

IMHO we need legal changes + strong enforcement mechanisms to keep one AI company or another from using *everything* that's accessible online. Otherwise, those who can profit from our online data will scrape it anyway (via 3rd parties?) & ditch the incriminating data afterward.

clarkesworldmagazine.com•188 days ago

Yes, change and enforcement are necessary to prevent the continued abuses. Unfortunately, politicians in both parties have demonstrated their unwillingness to lead on this issue. (The notable exception is that they are interested in protecting themselves from "AI".)

rookandraven.bsky.social•174 days ago

It really makes sense to push any art you post publicly through some kind of AI prevention tool, like Nightshade.

https://nightshade.cs.uchicago.edu/whatis.html

Comments

Posting Rules

Reply