how can a PUBLIC and SOCIAL network not have the posts PUBLICLY available? if you don’t want them public, then just don’t post
scraping has existed since the very beginning of the web, it’s literally how search engines work. attaching “ai” to it and saying it’s suddenly bad is just ridiculous imo
as soon as *anything* leaves your device it is no longer private.
if other people can see your shit – robots can see your shit, i always thought this is obvious lol
bad actors were always a thing and will forever be
though flags for good-faith actors would be useful i think
Yeah, what's on the internet is there forever, and people trying to get rid of it... well, people should know about the Streisand effect by now. I wouldn't mind my data being collected in an ethical manner if I was actually paid a fair price for it though. Not a fan of being a product.
like, my point wasn’t that collecting 1M posts is bad, my point is that if you do a thing that *is* bad and brag about it - you’ll obviously get in trouble
and if people consider what you did a bad thing (even if you don’t consider it to be such yourself) - same story
LLMs are a massive net positive on the world, absolutely no doubt about it and if you argue otherwise you’re being obtuse. *But* there’s no incentive for creators of the content it consumes to keep creating. Traffic/revenue keeps going down while the LLMs’ need for more content keeps going up.
All I can hope is the AI bubble bursts (I think we’re getting very close), the good LLMs remain A Thing and the research continues, but we aren’t seeing billions thrown at these companies. So their incentive to grow exponentially falls off a cliff, and Bluesky doesn’t need to worry so much about it.
there will always be new kinds of scraping, it’s inevitable if your content is public
breaking the benefits of a public network for sake of gating off scrapers is an endless cat and mouse game where the cat will never win. scrapers will always find a way in
bitching around the fact that ai is suddenly bad will not get you anywhere. chances are you typed that post with help of smart suggestions, autocorrect, and smart key area alignment on your phone’s virtual keyboard. wanna guess how they all work?
honestly, being only somewhat defensive of the mindset, I'm pretty sure that there's people using the internet nowadays that were born and matured by the time everything closed down and don't know what it used to be like
Comments
scraping has existed since the very beginning of the web, it’s literally how search engines work. attaching “ai” to it and saying it’s suddenly bad is just ridiculous imo
as soon as *anything* leaves your device it is no longer private.
if other people can see your shit – robots can see your shit, i always thought this is obvious lol
bad actors were always a thing and will forever be
though flags for good-faith actors would be useful i think
and if people consider what you did a bad thing (even if you don’t consider it to be such yourself) - same story
no one cried when twitter was just as open, though?
1. scraping was usually for spam (a fact of life), or for search engines that reward good content with traffic. LLMs don’t give traffic
2. we realistically were incredibly naive back then, and didn’t understand how it could go badly
breaking the benefits of a public network for sake of gating off scrapers is an endless cat and mouse game where the cat will never win. scrapers will always find a way in
It's that something being public does not mean unlicensed, or free for the taking.
OSS is public, but you can't do whatever you want with OSS code, unless the licence explicitly allows it.
Shops are public, but you can't walk into one and steal from it.