1. Can you stop companies from training generative AI using your data? No, not currently.
2. Is this dataset meant for training generative AI? 🤷‍♀️ but more likely for research and statistical analysis.
3. Is it ok to duplicate and distribute people’s data without agency to opt out? I’d argue no.
Reposted from Daniel van Strien
First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts 🦋

📊 1M public posts from Bluesky's firehose API
🔍 Includes text, metadata, and language predictions
🔬 Perfect to experiment with using ML for Bluesky 🤗

huggingface.co/datasets/blu...

Comments