It doesn't matter that Bluesky/ATProto is an open protocol. That might make scraping easier but the scrapers don't care. I promise they're scraping Twitter too, and the Fediverse/Mastodon. Anything that is public.
The only way to solve third party scraping is by making it illegal and enforcing it.
The only way to solve third party scraping is by making it illegal and enforcing it.
Comments
Without it, how could we monitor information trends, misinformation, opinion shifts & clusters?
Information thrives when free, not silo'ed.
Not all data collection is for AI 👍 We do it for OSINT, Analytics. For good.
Check what we do at https://exordelabs.com
Thing is, this data really shouldn't be allowed to be used in training data, although this is definitely not the platform's fault.
I also don't really like Nightshade style poisoning as it might hinder moderation and accessibility tools (CLIP isn't useful only for generating images but understanding them)
I wish more people used standards like OpenCL and Vulkan...
The "poisoned" images posted today might be unsuitable for training upon them now, but that won't be the case forever.
It's exactly like with piracy.
It's a tall order even for enforcement, imo.