Profile avatar
webis.de
Information is nothing without retrieval The Webis Group contributes to information retrieval, natural language processing, machine learning, and symbolic AI.
253 posts 603 followers 698 following
Prolific Poster
Conversation Starter
comment in response to post
…human texts today, contextualize the findings in terms of our theoretical contribution, and use them to make an assessment of the quality and adequacy of existing LLM detection benchmarks, which tend to be constructed with authorship attribution in mind, rather than authorship verification. 3/3
comment in response to post
…limits of the field. We argue that as LLMs improve, detection will not necessarily become impossible, but it will be limited by the capabilities and theoretical boundaries of the field of authorship verification. We conduct a series of exploratory analyses to show how LLM texts differ from… 2/3
comment in response to post
🧵 4/4 The shared task continues the research on LLM-based advertising. Participants can submit systems for two sub-tasks: First, generate responses with and without ads. Second, classify whether a response contains an ad. Submissions are open until May 10th and we look forward to your contributions.
comment in response to post
🧵 3/4 In a lot of cases, survey participants did not notice brand or product placements in the responses. As a first step towards ad-blockers for LLMs, we created a dataset of responses with and without ads and trained classifiers on the task of identifying the ads. dl.acm.org/doi/10.1145/...
comment in response to post
🧵 2/4 Given the high operating costs of LLMs, they require a business model to sustain them and advertising is a natural candidate. Hence, we have analyzed how well LLMs can blend product placements with "organic" responses and whether users are able to identify the ads. dl.acm.org/doi/10.1145/...
comment in response to post
🧵 4/4 Credit and thanks to the author team @lgnp.bsky.social @timhagen.bsky.social @maik-froebe.bsky.social @matthias-hagen.bsky.social @benno-stein.de @martin-potthast.com @hscells.bsky.social – you can also catch some of them at #ECIR2025 currently if you want to chat about RAG!
comment in response to post
🧵 3/4 This fundamentally challenges previous assumptions about RAG evaluation and system design. But we also show how crowdsourcing offers a viable and scalable alternative! Check out the paper for more. 📝 Preprint @ downloads.webis.de/publications... ⚙️ Code/Data @ github.com/webis-de/sig...
comment in response to post
🧵 2/4 Key findings: 1️⃣ Humans write best? No! LLM responses are rated better than human. 2️⃣ Essay answers? No! Bullet lists are often preferred. 3️⃣ Evaluate with BLEU? No! Reference-based metrics don't align with human preferences. 4️⃣ LLMs as judges? No! Prompted models produce inconsistent labels.
comment in response to post
Important Dates ---------------------- now Training Data Released May 23, 2025 Software submission May 30, 2025 Participant paper submission June 27, 2025 Peer review notification July 07, 2025 Camera-ready participant papers submission Sep 09-12, 2025 Conference
comment in response to post
4. Generative Plagiarism Detection. Given a pair of documents, your task is to identify all contiguous maximal-length passages of reused text between them. pan.webis.de/clef25/pan25...
comment in response to post
3. Multi-Author Writing Style Analysis. Given a document, determine at which positions the author changes. pan.webis.de/clef25/pan25...
comment in response to post
2. Multilingual Text Detoxification. Given a toxic piece of text, re-write it in a non-toxic way while saving the main content as much as possible. pan.webis.de/clef25/pan25...
comment in response to post
1. Voight-Kampff Generative AI Detection. Subtask 1: Given a (potentially obfuscated) text, decide whether it was written by a human or an AI. Subtask 2: Given a document collaboratively authored by human and AI, classify the extent to which the model assisted. pan.webis.de/clef25/pan25...
comment in response to post
Check out the paper: downloads.webis.de/publications...
comment in response to post
We find that simply scaling up the transformer architecture still leads to significant effectiveness drops in face of typos, keywords, ordering, and paraphrasing. We further highlight the need for more elaborate query variation datasets, which should retain the queries' semantics.
comment in response to post
Apologies, this was imported from Twitter/X; unfortunately the links do not work. But here's the correct link to the paper: webis.de/publications...
comment in response to post
Please put us on the list. 🙂
comment in response to post
Please add our group to the NLP starter pack. 🙂