cschroeder.bsky.social
PhD Candidate @ Leipzig University. Active Learning, Text Classification and LLMs. Check out my active learning library: small-text. #NLP #NLProc #ActiveLearning #LLM #ML #AI
54 posts
929 followers
2,670 following
Regular Contributor
Active Commenter
comment in response to
post
@ai2.bsky.social Any plans for plagiarism detection in semantic scholar? This would be incredibly useful, especially with the growing influx of (semi-)automatically generated papers.
comment in response to
post
Fixed: We need your support *for a* web survey.
Sorry, it seems bluesky has no edit feature yet.
comment in response to
post
I have the feeling, I did not reach the NLP crowd on bluesky yet. Where are the large groups here? Who do I have to ping❓
comment in response to
post
Please consider participating or sharing our survey! (If you have any experience with supervised learning in natural language processing, you are eligible to participate in our survey.)
comment in response to
post
The survey has a partial focus on, but not is limited to, active learning. See the original post for details.
➡️ Extended Deadline: January 26th, 2025.
comment in response to
post
❤️ We’re seeking responses from across the globe! If you know 1–3 people who might qualify for this survey—particularly those in different regions—please share it with them. We’d really appreciate it!
#NLP #NLProc #Annotation
comment in response to
post
Survey: bildungsportal.sachsen.de/umfragen/lim...
Estimated time required: 5–15 minutes
Deadline for participation: January 12, 2025
comment in response to
post
The survey has had a reasonable start; however, it still lacks sufficient visibility. To obtain a representative sample, we need responses from a diverse and globally representative sample.
Could you perhaps support us through your groups or networks? That would be a great help! 😊
comment in response to
post
Survey: bildungsportal.sachsen.de/umfragen/lim...
Deadline for participation: January 12, 2025
💙 Thank you for your support, and have a wonderful Sunday!
comment in response to
post
The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community.
💡Support us: If you know others working on supervised learning and NLP, please share this survey—we’d really appreciate it!
comment in response to
post
This is investigated particularly in the context of recent advancements, including but not limited to large language models.
👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP community to overcome a lack of labeled data.
comment in response to
post
💙 Thank you for your support, and have a wonderful Sunday!
Survey: bildungsportal.sachsen.de/umfragen/lim...
Deadline for participation: January 12, 2025
comment in response to
post
The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community.
💡Support us: If you know others working on supervised learning and NLP, please share this survey—we’d really appreciate it!
comment in response to
post
This is investigated particularly in the context of recent advancements, including but not limited to large language models.
👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP-community to overcome a lack of labeled data.
comment in response to
post
💙 Thank you for your support, and have a wonderful Sunday!
comment in response to
post
The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community.
💡Support us: If you know others working on supervised learning and NLP, please share this survey—we’d really appreciate it!
comment in response to
post
This is investigated particularly in the context of recent advancements, including but not limited to large language models.
👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP-community to overcome a lack of labeled data.
comment in response to
post
That's another good point: Many aspects in this discussion depend on this jurisdiction. Many people who have been outraged when learning about the bluesky API don't seem to realize this. Again, this only affects good actors, and we should rather be afraid of the bad ones.
comment in response to
post
You aren't, but I have read a lot of hate towards the API and even bluesky in the recent discussions around the bluesky datasets. Your points are all valid, but I don't see how this is easily doable for a decentralized open network. Personally, I avoid to share that kind of data online.
comment in response to
post
Informing the public is exactly what we need here. It sounded a little like it would add to the (wrong) sensationalism, but then I am glad I misinterpreted that. You are right that AI can be trained of that data, but this does not mean a resulting model can be used in legal way.
comment in response to
post
Don't get me wrong: This does not mean that someone else has the rights to train a commercial models or anything. The bad players, however, will do that with or without the API. The API, however, has many good uses for science and tech.
comment in response to
post
It is strange that people seem to accept closed-off communities such as Twitter or Facebook as the new normal. What you post on the internet might get seen and saved by someone. You posted it on a public space.
comment in response to
post
Bluesky has to communicate this better though. Also, people need to re-learn why the internet exists. It could only evolve through it's public nature. If you held search engines against the same standards, the internet would never have developed.
comment in response to
post
Factcheck: This has been possible at Twitter via the Twitter Firehose for many years, which even made money from sharing your data. Moreover, data collection would still be feasible without an API by using traditional web scraping techniques. Overall, the platform's openness is a positive aspect.
comment in response to
post
I rely on it extensively as well. Thank you for creating and maintaining this fantastic tool :)!
comment in response to
post
All data on Twitter is accessible as well. It was even accessible through a similar Firehose (not sure if this is still existing tbh), but for this you had to pay. Do you really prefer this option?
comment in response to
post
What is the point of this comment? And what would you suggest instead? The way the internet works, I don't see a solution to this problem.
The robots.txt is a good start. There are still many good players such as the Common Crawl, and the companies will at least partly obey to not lose reputation.
comment in response to
post
Completely agree. This is no way to treat other human beings.
For Bluesky, however, the toxicity and bot detection is posisbly (partly) in the hands of the community. Maybe something for the Hugging Face intern? See my post here bsky.app/profile/csch...
comment in response to
post
Same. Where can I get more blocklists before my feed looks like Twitter? Can we use the open API to detect toxic users? A record between a post id and a toxicity score should likely be possible to share, right?
comment in response to
post
Agree, but the both of your opinions are well-informed, which you rarely find among the angry mob. What I find most disturbing, besides the toxicity, is their ourage paired with a complete lack of understanding. (Guess what, Twitter also had a Firehose, but in most cases you even had to pay for it.)
comment in response to
post
If I walk along the street and look your art, my brain also memorizes parts. This does not mean I am allowed to copy your art.
In any case, what the people there don't understand: not the models or the data are the problem. They can be used for many perfectly fine use cases as well.
comment in response to
post
Yes, and to that, search engines perform the same actions, those people are trying to condemn here.
Copying their art style completely is a problem, yes. But at some points where it deviates from the style, imho it is still a new work (inspired by the oriignal work).
comment in response to
post
And yes, the hype sucks.
Is it really affecting artists that much already? Seriously curious, I don't have many connections there.
My impression so far was that while these tools are incredibly good, they are nowhere the level of a professional.
comment in response to
post
The solution, however, is not to attack people, technologies, or research.
We need machine-readable standards here, so that at least good actors and the data platforms have the mechanisms to decide (not) to work together.
Bad actors will always be able to get your data through unintended ways.