Also, let's not declare online experiments a lost cause. There are ways to adapt: Hidden instructions designed to trick LLMs, tracking tab changes and other meta info, disabling copy-paste, more sophisticated detection methods, ...
Comments
Log in with your Bluesky account to leave a comment
That said, these problems will get worse. More lab experiments could be a solution. But if we go in that direction, we should think about scaling them to match online sample sizes & diversity.
Comments