Quick summary: HF dataset of 1milllion posts from Bluesky's API by HF employee; backlash from folks on Bluesky about consenting & being trained for 'AI'; dataset was taken down by choice
Yeah a toxic response is never the right reaction. But I think I can understand where some are coming from. Don't necessarily agree it's all valid, but I do think there's a chance of actual discussion & understanding of what's going on. (From what I saw there's not an insignificant misunderstanding)
Yeah clearly there's a big gap in understanding. I don't think Bluesky's response was very useful or educational. I do think there could be room in this area for something like LGPL for data sets -- requiring downstream models to also be open for people upset at perceived profit motivations
I'm not just focusing on guidelines. I'm more talking about the broader concern of having data associated with a person available for different uses which includes training for 'AI'
I'm talking more of a discussion on the hypothetical of what if this line didn't exist vs this specific case
Comments
https://bsky.app/profile/bsky.app/post/3lbvgvbvcf22c
https://bsky.app/profile/danielvanstrien.bsky.social/post/3lbvih4luvk23
Also violated BlueSky developer guidelines.
Also possible GDPR violations.
But sure, they were *merely* using the public API
https://bsky.app/profile/questauthority.bsky.social/post/3lbxflhv4i223
I actually find this thread from @cfiesler.bsky.social quite good describing the big concerns folks have / should have about data sets related to them
https://bsky.app/profile/cfiesler.bsky.social/post/3lbwurkbfcs2w
Again, a real conversation
I'm talking more of a discussion on the hypothetical of what if this line didn't exist vs this specific case