As many of you following main might know @hf.co released 1M posts scraped from the @bsky.app Firehose API. It was a mess. People got angry, and I thought -- #privacydisaster.
So I decided to write about it. https://insights.priva.cat/p/privacy-disasters-facehuggers-are
Let's unpack this a bit... 👇
So I decided to write about it. https://insights.priva.cat/p/privacy-disasters-facehuggers-are
Let's unpack this a bit... 👇
Comments
📌
I also included an Article 18 restriction of processing request.
Many people have reported @alpindale.bsky.social and others, and Bsky basically gave them a pass, despite this very clearly violating their policies. I'm mostly salty about that. And technical sloppiness.
Or something closer to what Mastodon does, where users can toggle public sharing.
The issue isn't can, the issue is should, and does that extend to absolutely anything and everything?
Lots of "public" things still have use limitations!
But when public posts are mined for AI datasets without consent, things get murky. Is "public" really fair game for AI training?
Public for communicating on a social network is one thing -- but should that translate to #AI training fodder?
Sharing skeets on Bluesky ≠ consenting to AI scraping. Imagine you're at a party and a game of Truth or Dare leads to you singing (badly) at a party. You might consent to amusing the guests, but what if your off-key performance of 'Ken Lee' ends up on YouTube?
It's about context. Nobody expected that Bluesky's firehose data would be weaponized for AI datasets.
In short, they did nothing.
https://huggingface.co/datasets?search=bluesky%20posts
We skeet for conversations, not AI training. Misusing this breaches privacy norms & legal principles.