According ot the comments on the post, this went well ๐ค . I'd like to offer a different perspective on this "Don't use my data for AI slop! I don't cosent!" topic.
What happened? HuggingFace, a company in the AI sector, scraped 1 Million Bluesky posts and put them up on their site ๐งต
What happened? HuggingFace, a company in the AI sector, scraped 1 Million Bluesky posts and put them up on their site ๐งต
Reposted from
Daniel van Strien
First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts ๐ฆ
๐ 1M public posts from Bluesky's firehose API
๐ Includes text, metadata, and language predictions
๐ฌ Perfect to experiment with using ML for Bluesky ๐ค
huggingface.co/datasets/blu...
๐ 1M public posts from Bluesky's firehose API
๐ Includes text, metadata, and language predictions
๐ฌ Perfect to experiment with using ML for Bluesky ๐ค
huggingface.co/datasets/blu...
Comments
This lead to an understandable outrage, as many people are now sensitive to their public internet data being scraped and used for training machine learning (ML) models, like GPT, Claude, Mistral, Stable Diffusion, you name it.
I get it. I'm not a fan either.
It has become part of the datasets used to train ML models like GitHub Copilot, which spit out code on request. These models are wrapped up in services that are being sold to coders.
An LLM can write the letters, but it'll hardly have the same debugging skills, and places where I'd want to work should value those a lot more.
I have many friends in the (game) art community, who have a similar experience with regards to the art they published on the internet. It too is being used to train models, that then basically recombine and immitate the data they saw.
But the bottom line is only one part of this equation. The other is the morality of companies taking "our" public data to produce paid for services on top of them.
With GPT & co. this relationship does not exist.
To have a representative within the EU as contact person is a very good idea. But it has very little to do with this issue.
https://www.reuters.com/technology/eu-says-bluesky-is-violating-information-disclosure-rules-2024-11-25/