webis.de
Information is nothing without retrieval
The Webis Group contributes to information retrieval, natural language processing, machine learning, and symbolic AI.
253 posts
603 followers
698 following
Prolific Poster
Conversation Starter
comment in response to
post
…human texts today, contextualize the findings in terms of our theoretical contribution, and use them to make an assessment of the quality and adequacy of existing LLM detection benchmarks, which tend to be constructed with authorship attribution in mind, rather than authorship verification. 3/3
comment in response to
post
…limits of the field. We argue that as LLMs improve, detection will not necessarily become impossible, but it will be limited by the capabilities and theoretical boundaries of the field of authorship verification.
We conduct a series of exploratory analyses to show how LLM texts differ from… 2/3
comment in response to
post
🧵 4/4 The shared task continues the research on LLM-based advertising. Participants can submit systems for two sub-tasks: First, generate responses with and without ads. Second, classify whether a response contains an ad.
Submissions are open until May 10th and we look forward to your contributions.
comment in response to
post
🧵 3/4 In a lot of cases, survey participants did not notice brand or product placements in the responses. As a first step towards ad-blockers for LLMs, we created a dataset of responses with and without ads and trained classifiers on the task of identifying the ads.
dl.acm.org/doi/10.1145/...
comment in response to
post
🧵 2/4 Given the high operating costs of LLMs, they require a business model to sustain them and advertising is a natural candidate.
Hence, we have analyzed how well LLMs can blend product placements with "organic" responses and whether users are able to identify the ads.
dl.acm.org/doi/10.1145/...
comment in response to
post
🧵 4/4 Credit and thanks to the author team @lgnp.bsky.social @timhagen.bsky.social @maik-froebe.bsky.social @matthias-hagen.bsky.social @benno-stein.de @martin-potthast.com @hscells.bsky.social – you can also catch some of them at #ECIR2025 currently if you want to chat about RAG!
comment in response to
post
🧵 3/4 This fundamentally challenges previous assumptions about RAG evaluation and system design. But we also show how crowdsourcing offers a viable and scalable alternative! Check out the paper for more.
📝 Preprint @ downloads.webis.de/publications...
⚙️ Code/Data @ github.com/webis-de/sig...
comment in response to
post
🧵 2/4 Key findings:
1️⃣ Humans write best? No! LLM responses are rated better than human.
2️⃣ Essay answers? No! Bullet lists are often preferred.
3️⃣ Evaluate with BLEU? No! Reference-based metrics don't align with human preferences.
4️⃣ LLMs as judges? No! Prompted models produce inconsistent labels.
comment in response to
post
Important Dates
----------------------
now Training Data Released
May 23, 2025 Software submission
May 30, 2025 Participant paper submission
June 27, 2025 Peer review notification
July 07, 2025 Camera-ready participant papers submission
Sep 09-12, 2025 Conference
comment in response to
post
4. Generative Plagiarism Detection.
Given a pair of documents, your task is to identify all contiguous maximal-length passages of reused text between them.
pan.webis.de/clef25/pan25...
comment in response to
post
3. Multi-Author Writing Style Analysis.
Given a document, determine at which positions the author changes.
pan.webis.de/clef25/pan25...
comment in response to
post
2. Multilingual Text Detoxification.
Given a toxic piece of text, re-write it in a non-toxic way while saving the main content as much as possible.
pan.webis.de/clef25/pan25...
comment in response to
post
1. Voight-Kampff Generative AI Detection.
Subtask 1: Given a (potentially obfuscated) text, decide whether it was written by a human or an AI.
Subtask 2: Given a document collaboratively authored by human and AI, classify the extent to which the model assisted.
pan.webis.de/clef25/pan25...
comment in response to
post
Check out the paper: downloads.webis.de/publications...
comment in response to
post
We find that simply scaling up the transformer architecture still leads to significant effectiveness drops in face of typos, keywords, ordering, and paraphrasing. We further highlight the need for more elaborate query variation datasets, which should retain the queries' semantics.
comment in response to
post
Apologies, this was imported from Twitter/X; unfortunately the links do not work. But here's the correct link to the paper: webis.de/publications...
comment in response to
post
Please put us on the list. 🙂
comment in response to
post
Please add our group to the NLP starter pack. 🙂