cschroeder.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

@ai2.bsky.social Any plans for plagiarism detection in semantic scholar? This would be incredibly useful, especially with the growing influx of (semi-)automatically generated papers.

submitted 34 days ago

comment in response to post

Fixed: We need your support *for a* web survey. Sorry, it seems bluesky has no edit feature yet.

submitted 168 days ago

comment in response to post

I have the feeling, I did not reach the NLP crowd on bluesky yet. Where are the large groups here? Who do I have to ping❓

submitted 168 days ago

comment in response to post

Please consider participating or sharing our survey! (If you have any experience with supervised learning in natural language processing, you are eligible to participate in our survey.)

submitted 168 days ago

comment in response to post

The survey has a partial focus on, but not is limited to, active learning. See the original post for details. ➡️ Extended Deadline: January 26th, 2025.

submitted 168 days ago

comment in response to post

❤️ We’re seeking responses from across the globe! If you know 1–3 people who might qualify for this survey—particularly those in different regions—please share it with them. We’d really appreciate it! #NLP #NLProc #Annotation

submitted 184 days ago

comment in response to post

Survey: bildungsportal.sachsen.de/umfragen/lim... Estimated time required: 5–15 minutes Deadline for participation: January 12, 2025

submitted 184 days ago

comment in response to post

The survey has had a reasonable start; however, it still lacks sufficient visibility. To obtain a representative sample, we need responses from a diverse and globally representative sample. Could you perhaps support us through your groups or networks? That would be a great help! 😊

submitted 193 days ago

comment in response to post

Survey: bildungsportal.sachsen.de/umfragen/lim... Deadline for participation: January 12, 2025 💙 Thank you for your support, and have a wonderful Sunday!

submitted 196 days ago

comment in response to post

The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community. 💡Support us: If you know others working on supervised learning and NLP, please share this survey—we’d really appreciate it!

submitted 196 days ago

comment in response to post

This is investigated particularly in the context of recent advancements, including but not limited to large language models. 👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP community to overcome a lack of labeled data.

submitted 196 days ago

comment in response to post

💙 Thank you for your support, and have a wonderful Sunday! Survey: bildungsportal.sachsen.de/umfragen/lim... Deadline for participation: January 12, 2025

submitted 196 days ago

comment in response to post

The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community. 💡Support us: If you know others working on supervised learning and NLP, please share this survey—we’d really appreciate it!

submitted 196 days ago

comment in response to post

This is investigated particularly in the context of recent advancements, including but not limited to large language models. 👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP-community to overcome a lack of labeled data.

submitted 196 days ago

comment in response to post

💙 Thank you for your support, and have a wonderful Sunday!

submitted 196 days ago

comment in response to post

The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community. 💡Support us: If you know others working on supervised learning and NLP, please share this survey—we’d really appreciate it!

submitted 196 days ago

comment in response to post

This is investigated particularly in the context of recent advancements, including but not limited to large language models. 👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP-community to overcome a lack of labeled data.

submitted 196 days ago

comment in response to post

That's another good point: Many aspects in this discussion depend on this jurisdiction. Many people who have been outraged when learning about the bluesky API don't seem to realize this. Again, this only affects good actors, and we should rather be afraid of the bad ones.

submitted 208 days ago

comment in response to post

You aren't, but I have read a lot of hate towards the API and even bluesky in the recent discussions around the bluesky datasets. Your points are all valid, but I don't see how this is easily doable for a decentralized open network. Personally, I avoid to share that kind of data online.

submitted 208 days ago

comment in response to post

Informing the public is exactly what we need here. It sounded a little like it would add to the (wrong) sensationalism, but then I am glad I misinterpreted that. You are right that AI can be trained of that data, but this does not mean a resulting model can be used in legal way.

submitted 208 days ago

comment in response to post

Don't get me wrong: This does not mean that someone else has the rights to train a commercial models or anything. The bad players, however, will do that with or without the API. The API, however, has many good uses for science and tech.

submitted 208 days ago

comment in response to post

It is strange that people seem to accept closed-off communities such as Twitter or Facebook as the new normal. What you post on the internet might get seen and saved by someone. You posted it on a public space.

submitted 208 days ago

comment in response to post

Bluesky has to communicate this better though. Also, people need to re-learn why the internet exists. It could only evolve through it's public nature. If you held search engines against the same standards, the internet would never have developed.

submitted 208 days ago

comment in response to post

Factcheck: This has been possible at Twitter via the Twitter Firehose for many years, which even made money from sharing your data. Moreover, data collection would still be feasible without an API by using traditional web scraping techniques. Overall, the platform's openness is a positive aspect.

submitted 208 days ago

comment in response to post

I rely on it extensively as well. Thank you for creating and maintaining this fantastic tool :)!

submitted 209 days ago

comment in response to post

All data on Twitter is accessible as well. It was even accessible through a similar Firehose (not sure if this is still existing tbh), but for this you had to pay. Do you really prefer this option?

submitted 214 days ago

comment in response to post

What is the point of this comment? And what would you suggest instead? The way the internet works, I don't see a solution to this problem. The robots.txt is a good start. There are still many good players such as the Common Crawl, and the companies will at least partly obey to not lose reputation.

submitted 214 days ago

comment in response to post

Completely agree. This is no way to treat other human beings. For Bluesky, however, the toxicity and bot detection is posisbly (partly) in the hands of the community. Maybe something for the Hugging Face intern? See my post here bsky.app/profile/csch...

submitted 214 days ago

comment in response to post

Same. Where can I get more blocklists before my feed looks like Twitter? Can we use the open API to detect toxic users? A record between a post id and a toxicity score should likely be possible to share, right?

submitted 214 days ago

comment in response to post

Agree, but the both of your opinions are well-informed, which you rarely find among the angry mob. What I find most disturbing, besides the toxicity, is their ourage paired with a complete lack of understanding. (Guess what, Twitter also had a Firehose, but in most cases you even had to pay for it.)

submitted 214 days ago

comment in response to post

If I walk along the street and look your art, my brain also memorizes parts. This does not mean I am allowed to copy your art. In any case, what the people there don't understand: not the models or the data are the problem. They can be used for many perfectly fine use cases as well.

submitted 214 days ago

comment in response to post

Yes, and to that, search engines perform the same actions, those people are trying to condemn here. Copying their art style completely is a problem, yes. But at some points where it deviates from the style, imho it is still a new work (inspired by the oriignal work).

submitted 214 days ago

comment in response to post

And yes, the hype sucks. Is it really affecting artists that much already? Seriously curious, I don't have many connections there. My impression so far was that while these tools are incredibly good, they are nowhere the level of a professional.

submitted 214 days ago

comment in response to post

The solution, however, is not to attack people, technologies, or research. We need machine-readable standards here, so that at least good actors and the data platforms have the mechanisms to decide (not) to work together. Bad actors will always be able to get your data through unintended ways.

submitted 214 days ago