Evaluating LMs’ actions in applications is more contextualized. But how to create test cases? PrivacyLens offers a data construction pipeline that procedurally converts the norms into a vignette and then to an agent trajectory via template-based generation and sandbox simulation.
Comments
Paper: https://arxiv.org/abs/2409.00138
Website: https://salt-nlp.github.io/PrivacyLens/