webis.de - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

…human texts today, contextualize the findings in terms of our theoretical contribution, and use them to make an assessment of the quality and adequacy of existing LLM detection benchmarks, which tend to be constructed with authorship attribution in mind, rather than authorship verification. 3/3

submitted 19 days ago

comment in response to post

…limits of the field. We argue that as LLMs improve, detection will not necessarily become impossible, but it will be limited by the capabilities and theoretical boundaries of the field of authorship verification. We conduct a series of exploratory analyses to show how LLM texts differ from… 2/3

submitted 19 days ago

comment in response to post

🧵 4/4 The shared task continues the research on LLM-based advertising. Participants can submit systems for two sub-tasks: First, generate responses with and without ads. Second, classify whether a response contains an ad. Submissions are open until May 10th and we look forward to your contributions.

submitted 52 days ago

comment in response to post

🧵 3/4 In a lot of cases, survey participants did not notice brand or product placements in the responses. As a first step towards ad-blockers for LLMs, we created a dataset of responses with and without ads and trained classifiers on the task of identifying the ads. dl.acm.org/doi/10.1145/...

submitted 52 days ago

comment in response to post

🧵 2/4 Given the high operating costs of LLMs, they require a business model to sustain them and advertising is a natural candidate. Hence, we have analyzed how well LLMs can blend product placements with "organic" responses and whether users are able to identify the ads. dl.acm.org/doi/10.1145/...

submitted 52 days ago

comment in response to post

🧵 4/4 Credit and thanks to the author team @lgnp.bsky.social @timhagen.bsky.social @maik-froebe.bsky.social @matthias-hagen.bsky.social @benno-stein.de @martin-potthast.com @hscells.bsky.social – you can also catch some of them at #ECIR2025 currently if you want to chat about RAG!

submitted 75 days ago

comment in response to post

🧵 3/4 This fundamentally challenges previous assumptions about RAG evaluation and system design. But we also show how crowdsourcing offers a viable and scalable alternative! Check out the paper for more. 📝 Preprint @ downloads.webis.de/publications... ⚙️ Code/Data @ github.com/webis-de/sig...

submitted 75 days ago

comment in response to post

🧵 2/4 Key findings: 1️⃣ Humans write best? No! LLM responses are rated better than human. 2️⃣ Essay answers? No! Bullet lists are often preferred. 3️⃣ Evaluate with BLEU? No! Reference-based metrics don't align with human preferences. 4️⃣ LLMs as judges? No! Prompted models produce inconsistent labels.

submitted 75 days ago

comment in response to post

Important Dates ---------------------- now Training Data Released May 23, 2025 Software submission May 30, 2025 Participant paper submission June 27, 2025 Peer review notification July 07, 2025 Camera-ready participant papers submission Sep 09-12, 2025 Conference

submitted 108 days ago

comment in response to post

4. Generative Plagiarism Detection. Given a pair of documents, your task is to identify all contiguous maximal-length passages of reused text between them. pan.webis.de/clef25/pan25...

submitted 108 days ago

comment in response to post

3. Multi-Author Writing Style Analysis. Given a document, determine at which positions the author changes. pan.webis.de/clef25/pan25...

submitted 108 days ago

comment in response to post

2. Multilingual Text Detoxification. Given a toxic piece of text, re-write it in a non-toxic way while saving the main content as much as possible. pan.webis.de/clef25/pan25...

submitted 108 days ago

comment in response to post

1. Voight-Kampff Generative AI Detection. Subtask 1: Given a (potentially obfuscated) text, decide whether it was written by a human or an AI. Subtask 2: Given a document collaboratively authored by human and AI, classify the extent to which the model assisted. pan.webis.de/clef25/pan25...

submitted 108 days ago

comment in response to post

Check out the paper: downloads.webis.de/publications...

submitted 220 days ago

comment in response to post

We find that simply scaling up the transformer architecture still leads to significant effectiveness drops in face of typos, keywords, ordering, and paraphrasing. We further highlight the need for more elaborate query variation datasets, which should retain the queries' semantics.

submitted 220 days ago

comment in response to post

Apologies, this was imported from Twitter/X; unfortunately the links do not work. But here's the correct link to the paper: webis.de/publications...

submitted 223 days ago

comment in response to post

Please put us on the list. 🙂

submitted 223 days ago

comment in response to post

Please add our group to the NLP starter pack. 🙂

submitted 225 days ago