tfwerner.com
Postdoc the Center for Humans and Machines (CHM/MPIB) | PhD in Economics | Affiliated with DICE/HHU & BCCP
I am on the economic job market 2024/2025.
Tfwerner.com
60 posts
824 followers
252 following
Prolific Poster
Conversation Starter
comment in response to
post
I think those are excellent suggestions that align well with our approach. I'd also reconsider the design to minimize open-ended questions where possible. While automated AI bots may be the future, they're not the norm yet. But, most participants already seem inclined to use LLMs for open-ended Q
comment in response to
post
Wrong link above. The correct one: blog.cloudflare.com/declaring-yo...
comment in response to
post
It would be great if organizations like ESA and others could lead international collaborations to tackle this problem together.
comment in response to
post
That said, these problems will get worse. More lab experiments could be a solution. But if we go in that direction, we should think about scaling them to match online sample sizes & diversity.
comment in response to
post
Companies like Cloudflare are already working on detecting fully automated LLM agents. (HT @Hiromu Yakura from our lab) blog.cloudflare.com/firewall-for...
comment in response to
post
Also, let's not declare online experiments a lost cause. There are ways to adapt: Hidden instructions designed to trick LLMs, tracking tab changes and other meta info, disabling copy-paste, more sophisticated detection methods, ...
comment in response to
post
If we want to scale lab experiments, we must build international collaborations. This applies overall sample sizes and the diversity of the subjects we recruit.
comment in response to
post
Offline labs often have a very narrow participant pool (mostly young students from industrialized countries). This is fine for some research questions but not for others. Online sampling made it much easier to get diverse and more representative samples, including non-WEIRD populations.
comment in response to
post
Not sure about what's more important but you cannot do reinforcement learning without a very large foundation model.
comment in response to
post
Interesting! Do you think the difference could be novelty bias since we’re less familiar with DeepSeek’s flaws when it comes to writing tasks? Still have to test it myself for writing
comment in response to
post
Yes, I agree on the brick-and-mortar labs. Also having international corporations allows for more representative samples compared to past offline lab days. It would be great if an organization like the ESA could lead the lead the coordination of such efforts.
comment in response to
post
I haven't tested Nightshade, but my understanding is that it's mostly to protect your images from being included in training data. Does it also work to prevent AI from using the images at inference time?
comment in response to
post
H/T to @iyadrahwan.bsky.social for highlighting this issue the other day.
comment in response to
post
While Gemini doesn’t take active decisions, it streams my screen in real-time to an LLM, helping me within the experiment. With OpenAI’s Operator model, participants could fully outsource their participation to LLMs. How will we tackle this growing issue as a profession?
openai.com/index/introd...
comment in response to
post
"Erst ab 10 Euro" and "Nur mit EC Karte"! ;-)
comment in response to
post
bsky.app/profile/tfwe...
comment in response to
post
Thanks Simon! Sounds interesting, I will check it out next year
comment in response to
post
bsky.app/profile/tfwe...
comment in response to
post
Haha, glad there’s so much interest! I got it here 🎄🎅: www.geeksoutfit.com/products/tha...