manueltonneau.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

I was curious how that compared to the ARR round for ACL this year (Feb 2025) and the number of subs was 8.3K, all time high. I wonder what drives this: increased interest in the field, AI-generated papers, ... What do you think?

submitted 25 days ago

comment in response to post

I use Citymapper and also the DB app for regio/Sbahn. But also use GMaps often 😅

submitted 55 days ago

comment in response to post

It says on OpenReview that the deadline is Feb 24 2025 12:00AM UTC-0, is that normal? Thanks a lot for organizing!

submitted 124 days ago

comment in response to post

One limitation in any case, that may explain differences between our results, is that Perspective is a moving target, with the model changing over time with little transparency on when it does. there's a cool paper on this: aclanthology.org/2023.emnlp-m...

submitted 206 days ago

comment in response to post

I guess that while Perspective German scores are biased upwards (which is problematic as you rightly point out), the tool may still work for German as long as you adapt the threshold? We use threshold-agnostic metrics in our eval.

submitted 206 days ago

comment in response to post

this is cool, thanks for sharing! On Perspective, we have recent work evaluating hate speech models on representative Twitter data and while perf is generally low, Perspective does almost as good as the best German open-source model and Perspective perf for German > English on the day of study

submitted 206 days ago

comment in response to post

thanks for your kind words, and thanks a ton for making our project possible 🙏🙏🙏

submitted 212 days ago

comment in response to post

Your feedback is much appreciated as we prepare the final version of the paper. We would like to thank @jurgenpfeffer.bsky.social and team who collected the TwitterDay dataset from which HateDay is sampled and without whom this work would not have been possible! 🙏🙏🙏

submitted 212 days ago

comment in response to post

What about moderation? Given low perf, automatic moderation is not desirable. We investigate the feasibility of human-in-the-loop moderation where models flag and humans verify. Moderating >80% of all hate would require humans to review >10% of all daily tweets which can get 💸💸 for large communities

submitted 212 days ago

comment in response to post

We also find other reasons for low perf, such as the misalignment between target focus in academic work and target prevalence in the wild, as well as the difficulty to distinguish use and mention of hate presented in past work @gligoric.bsky.social

submitted 212 days ago

comment in response to post

Why is perf so low? An important reason is it is hard to distinguish between offensive and hateful content (as exposed by @thomasdavidson.bsky.social in seminal work) and offensive content is much more prevalent than hate in the wild, crowding out hate in the predicted positives

submitted 212 days ago

comment in response to post

We then evaluate popular hate speech detection LLMs on HateDay and compare with their performance on academic hate speech datasets and functional tests (HateCheck). We find that traditional eval methods systematically overestimate performance on representative data, which is low.

submitted 212 days ago

comment in response to post

We first look at the prevalence and composition of hate in HateDay and find that most types of hate are represented across contexts, with some local specificities in the importance of each hate type (e.g., green-bashing in German tweets, islamophobia in India).

submitted 212 days ago