A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models.
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
Reposted from
Jeremy Howard
Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
Comments
Publishers don't want them to stop, they just want to get paid for it. They are actively working to make that happen.
Quit and leave
This is part of the design of Bluesky. If data privacy if your priority this is the worse social media site to be on.
A large portion of people joined this site to avoid the data scraping on X
Which artists are you promoting for with your slop, and did they give permission to the company you're using?
Bluesky has a firehose like Twitter used to. You don't need to scrape, you just need to ask it to send you all skeets.
This industry depends on not asking permission for or compensating people for something that has value to it.
The other ones still contain PII are just "for lolz", not for social research science. Take it from the guy who created the sets, he did it to piss people off. Yet you are defending this?
Is this surprising to you?
EU law has a broad exemption for text and data mining activities. Why do you think building a dataset isn't covered by the TDM exception?
I don't engage with a fundamentalist by telling them they are wrong. It's not useful.
https://arxiv.org/abs/2407.14933
I think that many people on this website need to be told the info in the OP.