manueltonneau.bsky.social
PhD candidate @oiioxford.bsky.social NLP, Computational Social Science @WorldBank
manueltonneau.com
35 posts
672 followers
537 following
Prolific Poster
Conversation Starter
comment in response to
post
I was curious how that compared to the ARR round for ACL this year (Feb 2025) and the number of subs was 8.3K, all time high. I wonder what drives this: increased interest in the field, AI-generated papers, ... What do you think?
comment in response to
post
I use Citymapper and also the DB app for regio/Sbahn. But also use GMaps often π
comment in response to
post
It says on OpenReview that the deadline is Feb 24 2025 12:00AM UTC-0, is that normal? Thanks a lot for organizing!
comment in response to
post
One limitation in any case, that may explain differences between our results, is that Perspective is a moving target, with the model changing over time with little transparency on when it does. there's a cool paper on this: aclanthology.org/2023.emnlp-m...
comment in response to
post
I guess that while Perspective German scores are biased upwards (which is problematic as you rightly point out), the tool may still work for German as long as you adapt the threshold? We use threshold-agnostic metrics in our eval.
comment in response to
post
this is cool, thanks for sharing! On Perspective, we have recent work evaluating hate speech models on representative Twitter data and while perf is generally low, Perspective does almost as good as the best German open-source model and Perspective perf for German > English on the day of study
comment in response to
post
thanks for your kind words, and thanks a ton for making our project possible πππ
comment in response to
post
Your feedback is much appreciated as we prepare the final version of the paper. We would like to thank @jurgenpfeffer.bsky.social and team who collected the TwitterDay dataset from which HateDay is sampled and without whom this work would not have been possible! πππ
comment in response to
post
What about moderation? Given low perf, automatic moderation is not desirable. We investigate the feasibility of human-in-the-loop moderation where models flag and humans verify. Moderating >80% of all hate would require humans to review >10% of all daily tweets which can get πΈπΈ for large communities
comment in response to
post
We also find other reasons for low perf, such as the misalignment between target focus in academic work and target prevalence in the wild, as well as the difficulty to distinguish use and mention of hate presented in past work @gligoric.bsky.social
comment in response to
post
Why is perf so low? An important reason is it is hard to distinguish between offensive and hateful content (as exposed by @thomasdavidson.bsky.social in seminal work) and offensive content is much more prevalent than hate in the wild, crowding out hate in the predicted positives
comment in response to
post
We then evaluate popular hate speech detection LLMs on HateDay and compare with their performance on academic hate speech datasets and functional tests (HateCheck). We find that traditional eval methods systematically overestimate performance on representative data, which is low.
comment in response to
post
We first look at the prevalence and composition of hate in HateDay and find that most types of hate are represented across contexts, with some local specificities in the importance of each hate type (e.g., green-bashing in German tweets, islamophobia in India).