Evaluating LMs’ actions in applications is more contextualized. But how to create test cases? PrivacyLens offers a data construction pipeline that procedurally converts the norms into a vignette and then to an agent trajectory via template-based generation and sandbox simulation.
Post image

Comments